fullNHANES_recat <- read_csv(here("cleaned_data","fullNHANES_recat.csv"))
## New names:
## Rows: 22349 Columns: 30
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (10): fpl, age, gender, refED, refEDspouse, childED, adultED, ethnicity,... dbl
## (20): ...1, year, WTINT2YR, SDMVPSU, SDMVSTRA, DMDEDUC3, DMDEDUC2, DMDHR...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Before we can start our analyses, we need to use the svydesign function from the “survey” package written by Thomas Lumley. The svydesign function tells R about the design elements in the survey. Once this command has been issued, all that needs to be done for the analyses is use the object that contains this information in each command. Because the 2001-2016 NHANES data were released with a sampling weight (wtint2yr), a PSU variable (sdmvpsu) and a strata variable (sdmvstra), we will use these our svydesign function.
nhc <- svydesign(id=~SDMVPSU, weights=~WTINT2YR,strata=~SDMVSTRA, nest=TRUE, survey.lonely.psu = "adjust", data=fullNHANES_recat)
nhc
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
summary(nhc)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.278e-06 2.534e-05 4.579e-05 6.387e-05 8.363e-05 7.468e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
## obs 173 195 184 216 167 226 202 205 179 221 191 177 190 218 157 169 220
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
## obs 183 154 184 279 178 177 171 177 159 171 152 183 140 206 177 200 155
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
## obs 152 236 190 158 177 153 189 175 166 175 129 173 200 244 188 184 205
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81
## obs 185 185 236 195 151 127 134 146 91 74 229 211 244 216 196 194 213
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
## obs 195 189 178 154 225 147 154 74 240 268 239 152 181 207 171 151 204
## design.PSU 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2
## actual.PSU 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2
## 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
## obs 163 183 208 156 71 194 171 182 184 200 184 171 179 211 182 202 181
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
## obs 198 140 198 144 227 204 197 171 179 210 243 223 221 230 258 261 270
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 133
## obs 167
## design.PSU 2
## actual.PSU 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
Complex survey data are unique. With survey data, you (almost) never get to delete any cases from the data set, even if you will never use them in any of your analyses. Instead, the survey package has two options that allow you to correctly analyze subpopulations of your survey data.
These options are ‘svyby’ and ‘subset.survey.design’.
The subset.survey.design option is sort of like deleting unwanted cases (without really deleting them, of course), and the svyby option is very similar to by-group processing in that the results are shown for each group of the by-variable.
There are two formulas that can used to calculate the standard errors.
One formula is used when you do by-group processing or delete unwanted cases from the dataset, and survey statisticians call this the conditional approach. This is used when members of the subpopulation cannot appear in certain strata and therefore those strata should not be used in the calculation of the standard error. In practice, this rarely happens in public-use complex survey datasets. One reason is because the analyst usually does not know which combination of variables defines a particular stratum.
The other formula is used when you use the svyby option, and survey statisticians call this the unconditional approach. This is used when members of the subpopulation can be in any of the strata, even if there are some strata in the sample data that do not contain any members of the subpopulation.
Because members of the subpopulation, all of the strata need to be used in the calculation of the standard error, and hence all of the data must be in the dataset.
If the data set is subset (meaning that observations not to be included in the subpopulation are deleted from the data set), the standard errors of the estimates cannot be calculated correctly. When the svyby option is used, only the cases defined by the subpopulation are used in the calculation of the estimate, but all cases are used in the calculation of the standard errors.
[For more information on this issue, please see Sampling Techniques, Third Edition by William G. Cochran (1977) and Small Area Estimation by J. N. K. Rao (2003). A nice description of this issue given in Brady West’s 2009 Stata Conference (in Washington, D.C.).]
Both svyby and subset.svy.design use the formula for the unconditional standard errors.
svymean(~RIDAGEYR, nhc)
## mean SE
## RIDAGEYR 38.744 0.2599
(need to tell R to skip the missing values)
svymean(~URXMEP, nhc, na.rm = TRUE)
## mean SE
## URXMEP 269.81 9.4418
The variable female is the subpopulation variable.
svyby(~RIDAGEYR, ~gender, nhc, svymean)
| gender | RIDAGEYR | se |
|---|---|---|
| female | 39.5 | 0.294 |
| male | 38 | 0.326 |
Primary = 0:8 Secondary = 9:15
svyby(~DMDEDUC3, ~age, nhc, svymean, na.rm = TRUE)
| age | DMDEDUC3 | se |
|---|---|---|
| child | 6.06 | 0.0827 |
| middle-aged | 0 | 0 |
| older adult | 0 | 0 |
| young adult | 14.3 | 0.325 |
To do so, put + between the variables.
svyby(~RIDAGEYR, ~refED+gender, nhc, svymean)
| refED | gender | RIDAGEYR | se |
|---|---|---|---|
| college and beyond | female | 39.8 | 0.53 |
| partial college and below | female | 39.4 | 0.315 |
| college and beyond | male | 40 | 0.549 |
| partial college and below | male | 37.4 | 0.354 |
Three variables are used.
svyby(~log(monoEthyl), ~refED+citizenship+gender, nhc, na = TRUE, svymean)
| refED | citizenship | gender | log(monoEthyl) | se |
|---|---|---|---|---|
| college and beyond | birth or naturalization | female | 3.88 | 0.0561 |
| partial college and below | birth or naturalization | female | 4.3 | 0.0335 |
| college and beyond | not U,S, citizen | female | 3.77 | 0.146 |
| partial college and below | not U,S, citizen | female | 4.55 | 0.0716 |
| college and beyond | birth or naturalization | male | 3.89 | 0.0452 |
| partial college and below | birth or naturalization | male | 4.25 | 0.0314 |
| college and beyond | not U,S, citizen | male | 3.95 | 0.148 |
| partial college and below | not U,S, citizen | male | 4.67 | 0.0663 |
Sometimes you don’t want so much output. Rather, you just want the output for a specific group. You can get this by creating a subpopulation of the data with the subset function. In the example below, we obtain the output only for males.
smale <- subset(nhc,gender == "male")
summary(smale)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, gender == "male")
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.675e-06 2.594e-05 4.641e-05 6.351e-05 8.457e-05 5.394e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
## obs 86 87 92 100 85 110 103 95 101 111 88 89 81 114 79 78 111 89 68 92
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
## obs 135 84 80 78 70 85 81 74 89 78 106 88 86 74 78 119 98 77 97 70 86 85
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76
## obs 88 93 64 85 99 116 98 98 103 83 87 117 101 78 62 66 74 40 32 112 110
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
## obs 120 108 97 96 115 103 85 100 79 119 72 76 41 123 133 126 65 92 97 83
## design.PSU 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2
## 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## obs 75 101 70 100 113 84 33 101 73 85 86 97 88 89 78 95 86
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130
## obs 97 91 96 64 92 61 121 93 102 75 76 122 121 111 103 111 124
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 131 132 133
## obs 132 147 79
## design.PSU 2 2 2
## actual.PSU 2 2 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
svymean(~RIDAGEYR,design=smale)
## mean SE
## RIDAGEYR 37.967 0.3256
schild <- subset(nhc,age == "child")
summary(schild)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "child")
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.386e-06 3.978e-05 7.794e-05 8.869e-05 1.160e-04 4.555e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs 64 75 78 76 54 75 96 57 69 72 88 71 81 67 71 52 82 52 56 77 92 59 77
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs 70 67 62 72 65 81 56 58 69 72 50 66 90 56 76 51 69 78 78 69 84 53 41
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs 43 73 49 41 69 67 62 50 53 30 47 48 33 25 29 69 59 62 56 55 50 59 70
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs 50 45 44 62 47 56 15 66 83 79 33 65 40 56 48 67 45 59 42 63 19
## design.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs 41 47 59 76 65 57 57 58 50 47 56 52 79 40 70 47 119
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 121 122 123 124 125 126 127 128 129 130 131 132 133
## obs 94 83 70 70 74 126 89 104 109 120 141 121 49
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
svymean(~log(monoEthyl), design = schild, na.rm = TRUE)
## mean SE
## log(monoEthyl) 3.9152 0.0329
syadult <- subset(nhc,age == "young adult")
summary(syadult)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "young adult")
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.971e-06 2.032e-05 3.711e-05 5.714e-05 5.743e-05 7.468e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs 7 19 23 23 15 36 17 17 18 20 20 24 16 23 19 18 15 22 16 13 21 17 20
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs 19 23 7 26 20 28 23 17 13 24 19 17 32 9 15 29 14 25 11 21 25 12 11
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs 12 15 18 11 21 13 15 21 22 15 5 10 23 10 6 20 14 18 31 23 23 17 19
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs 20 22 11 34 12 9 12 26 24 20 16 25 15 23 16 27 8 11 46 15 7
## design.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs 13 19 22 12 17 18 10 12 22 17 23 19 15 9 20 13 13
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 121 122 123 124 125 126 127 128 129 130 131 132 133
## obs 17 15 8 10 18 19 9 8 13 7 16 21 19
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
svymean(~log(monoEthyl), design = syadult, na.rm = TRUE)
## mean SE
## log(monoEthyl) 4.3766 0.053
smid <- subset(nhc,age == "middle-aged")
summary(smid)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "middle-aged")
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.278e-06 1.553e-05 3.661e-05 4.409e-05 5.261e-05 7.468e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
## obs 69 79 62 87 80 80 59 89 65 89 65 64 75 79 54 62 80 69 61 68 101 60
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
## obs 63 63 60 59 55 59 57 41 93 73 64 48 53 77 89 55 81 56 65 63 60 56 48
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
## obs 77 81 106 79 99 87 92 74 109 91 90 53 58 61 43 35 108 91 112 106 97
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
## obs 84 82 84 88 89 71 95 64 65 39 102 131 111 73 61 105 74 70 73 83 87
## design.PSU 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
## obs 86 65 34 90 73 76 78 88 66 87 93 96 80 87 85 78 69
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
## obs 87 58 77 75 83 69 77 76 73 87 79 89 91 74 91 76
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
svymean(~log(monoEthyl), design = smid, na.rm = TRUE)
## mean SE
## log(monoEthyl) 4.2507 0.0303
soadult <- subset(nhc,age == "older adult")
summary(soadult)
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, age == "older adult")
## Probabilities:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.675e-06 2.679e-05 4.917e-05 6.462e-05 8.518e-05 5.199e-04
## Stratum Sizes:
## 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## obs 33 22 21 30 18 35 30 42 27 40 18 18 18 49 13 37 43 40 21 26 65 42 17
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
## obs 19 27 31 18 8 17 20 38 22 40 38 16 37 36 12 16 14 21 23 16 10 16 44
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
## obs 64 50 42 33 28 13 34 56 29 16 22 18 29 13 4 32 47 52 23 21 37 55 22
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
## obs 31 22 28 34 24 24 8 46 30 29 30 30 47 18 17 37 27 26 34 13 11
## design.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 3 2 2 2 3 3 3 2 2 2 2 2 2 2 2 2 2 2
## 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## obs 50 32 25 18 30 43 17 16 43 38 36 25 26 22 21 26 18
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## 121 122 123 124 125 126 127 128 129 130 131 132 133
## obs 18 16 24 22 42 25 38 30 19 40 30 37 23
## design.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## actual.PSU 2 2 2 2 2 2 2 2 2 2 2 2 2
## Data variables:
## [1] "...1" "year" "WTINT2YR" "SDMVPSU" "SDMVSTRA"
## [6] "DMDEDUC3" "DMDEDUC2" "DMDHREDU" "DMDHSEDU" "RIDAGEYR"
## [11] "RIAGENDR" "RIDRETH1" "INDFMPIR" "DMDYRSUS" "DMDCITZN"
## [16] "URXMEP" "fpl" "age" "gender" "persWeight"
## [21] "psu" "strata" "refED" "refEDspouse" "childED"
## [26] "adultED" "ethnicity" "citizenship" "yearsUS" "monoEthyl"
svymean(~log(monoEthyl), design = soadult, na.rm = TRUE)
## mean SE
## log(monoEthyl) 4.1572 0.0451
A wide variety of statistical models can be run with complex survey data.
With only a few exceptions, the results of these analyses can be interpreted just as the results from the same analyses with experimental or quasi-experimental data.
For example, if you run an OLS regression with weighted data, assuming that the sampling plan has been correctly specified, the regression coefficients are interpreted exactly as any other OLS regression coefficient.
The same is true for the various logistic regression models, including binary logistic regression, ordinal logistic regression and multinomial logistic regression (of which there is not an example in this workshop).
Most of the assumptions of these models are also the same. However, some assumptions, such as the assumption regarding the normality of the residuals in OLS regression, are often not meaningful because of the large sample size commonly seen with complex survey data.
svyttest(log(monoEthyl)~0, nhc, na = TRUE)
##
## Design-based one-sample t-test
##
## data: log(monoEthyl) ~ 0
## t = 173.02, df = 123, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 4.134424 4.230118
## sample estimates:
## mean
## 4.182271
svyttest(log(monoEthyl)~refED, nhc)
##
## Design-based t-test
##
## data: log(monoEthyl) ~ refED
## t = 9.2962, df = 123, p-value = 6.764e-16
## alternative hypothesis: true difference in mean is not equal to 0
## 95 percent confidence interval:
## 0.3289704 0.5069672
## sample estimates:
## difference in mean
## 0.4179688
As you probably know, an independent-samples t-test tests the null hypothesis that the difference in the means of the two groups is 0. Another way to think about this type of t-test is to think of it as a linear regression with a single binary predictor. The intercept will be the mean of the reference group, and the coefficient will be the difference between the two groups.
We will start by running the t-test function as before, and then replicate the results using the svyglm function, which can be used to run a linear regression. The svyby function is used with the covmat argument to save the elements to a matrix so that we can use the svycontrast function to subtract the values.
The purpose of this example is not to belabor the point about a t-test, but rather to show how to get a matrix of values and then compare those values with the svycontrast function in a simple example where the answer is already known.
svyttest(RIDAGEYR~gender, nhc)
##
## Design-based t-test
##
## data: RIDAGEYR ~ gender
## t = -4.4888, df = 123, p-value = 1.625e-05
## alternative hypothesis: true difference in mean is not equal to 0
## 95 percent confidence interval:
## -2.1819236 -0.8464863
## sample estimates:
## difference in mean
## -1.514205
summary(svyglm(RIDAGEYR~gender, design=nhc))
##
## Call:
## svyglm(formula = RIDAGEYR ~ gender, design = nhc)
##
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.4808 0.2937 134.421 < 2e-16 ***
## gendermale -1.5142 0.3373 -4.489 1.62e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 436.5408)
##
## Number of Fisher Scoring iterations: 2
a <- svyby(~RIDAGEYR, ~gender, nhc, na.rm.by = TRUE, svymean, covmat = TRUE)
vcov(a)
## female male
## female 0.08626588 0.03923044
## male 0.03923044 0.10598469
svycontrast(a, c( -1, 1))
## contrast SE
## contrast -1.5142 0.3373
We need to use the summary function to get the standard errors, test
statistics and p-values. Let’s start with a model that has no
interaction terms.
The outcome variable will be monoEthyl, and the predictors will be age
and refED
summary(svyglm(log(monoEthyl)~age+refED, design=nhc, na.action = na.omit))
##
## Call:
## svyglm(formula = log(monoEthyl) ~ age + refED, design = nhc,
## na.action = na.omit)
##
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.59738 0.04730 76.059 < 2e-16 ***
## agemiddle-aged 0.36360 0.03096 11.745 < 2e-16 ***
## ageolder adult 0.24964 0.05355 4.662 8.19e-06 ***
## ageyoung adult 0.44357 0.05550 7.992 9.21e-13 ***
## refEDpartial college and below 0.42697 0.04492 9.504 2.61e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.444453)
##
## Number of Fisher Scoring iterations: 2
Now let’s add an interaction between the two predictor variables, age and reference person education
summary(svyglm(log(monoEthyl)~age*refED, design=nhc, na.action = na.omit))
##
## Call:
## svyglm(formula = log(monoEthyl) ~ age * refED, design = nhc,
## na.action = na.omit)
##
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 3.56852 0.06150 58.022
## agemiddle-aged 0.40919 0.06808 6.011
## ageolder adult 0.23411 0.09114 2.569
## ageyoung adult 0.48141 0.11815 4.074
## refEDpartial college and below 0.46565 0.06444 7.226
## agemiddle-aged:refEDpartial college and below -0.06306 0.07855 -0.803
## ageolder adult:refEDpartial college and below 0.02168 0.09871 0.220
## ageyoung adult:refEDpartial college and below -0.04994 0.12388 -0.403
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## agemiddle-aged 2.15e-08 ***
## ageolder adult 0.0115 *
## ageyoung adult 8.43e-05 ***
## refEDpartial college and below 5.48e-11 ***
## agemiddle-aged:refEDpartial college and below 0.4237
## ageolder adult:refEDpartial college and below 0.8266
## ageyoung adult:refEDpartial college and below 0.6876
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.444224)
##
## Number of Fisher Scoring iterations: 2
glm1 <- (svyglm(log(monoEthyl)~gender+refED, design=nhc, na.action = na.omit))
glm1
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ gender + refED, design = nhc,
## na.action = na.omit)
##
## Coefficients:
## (Intercept) gendermale
## 3.89173 -0.01296
## refEDpartial college and below
## 0.41771
##
## Degrees of Freedom: 20686 Total (i.e. Null); 122 Residual
## (1662 observations deleted due to missingness)
## Null Deviance: 51790
## Residual Deviance: 51050 AIC: 84630
This example is just like the previous one, only here factor notation is used. This is important when the categorical predictor has more than two levels.
summary(svyglm(log(monoEthyl)~factor(gender)+factor(ethnicity), design=nhc, na.action = na.omit))
##
## Call:
## svyglm(formula = log(monoEthyl) ~ factor(gender) + factor(ethnicity),
## design = nhc, na.action = na.omit)
##
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.393113 0.049860 88.109 < 2e-16 ***
## factor(gender)male -0.009393 0.027889 -0.337 0.7368
## factor(ethnicity)Non-Hispanic Black 0.513229 0.065652 7.817 2.41e-12 ***
## factor(ethnicity)Non-Hispanic White -0.354993 0.061970 -5.728 7.77e-08 ***
## factor(ethnicity)Other Hispanic 0.141275 0.074951 1.885 0.0619 .
## factor(ethnicity)Other or Multi -0.631021 0.083302 -7.575 8.53e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.404013)
##
## Number of Fisher Scoring iterations: 2
modela <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
summ(modela)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.08 49.10 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.08 0.00
## ageyoung adult 0.44 0.05 8.16 0.00
## gendermale -0.01 0.03 -0.17 0.86
## ethnicityNon-Hispanic 0.57 0.07 8.37 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.84 0.00
## fplfamily income 2x poverty 0.03 0.04 0.70 0.48
## threshold
## fplfamily income 3x poverty 0.06 0.05 1.10 0.27
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.83 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.87 0.06
## threshold
## fplfamily income more than 0.09 0.06 1.55 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.91 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modela, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(modela, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.08 49.10 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.08 0.00
## ageyoung adult 0.44 0.05 8.16 0.00
## gendermale -0.01 0.03 -0.17 0.86
## ethnicityNon-Hispanic 0.57 0.07 8.37 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.84 0.00
## fplfamily income 2x poverty 0.03 0.04 0.70 0.48
## threshold
## fplfamily income 3x poverty 0.06 0.05 1.10 0.27
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.83 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.87 0.06
## threshold
## fplfamily income more than 0.09 0.06 1.55 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.91 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modela, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.090
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.714 3.564 3.864 49.101 0.000
## refEDpartial college and 0.355 0.264 0.445 7.767 0.000
## below
## agemiddle-aged 0.363 0.296 0.430 10.776 0.000
## ageolder adult 0.350 0.236 0.464 6.078 0.000
## ageyoung adult 0.443 0.335 0.551 8.156 0.000
## gendermale -0.005 -0.063 0.053 -0.174 0.862
## ethnicityNon-Hispanic 0.569 0.434 0.704 8.365 0.000
## Black
## ethnicityNon-Hispanic -0.287 -0.415 -0.159 -4.455 0.000
## White
## ethnicityOther Hispanic 0.146 -0.004 0.297 1.923 0.057
## ethnicityOther or Multi -0.566 -0.730 -0.402 -6.839 0.000
## fplfamily income 2x poverty 0.031 -0.056 0.118 0.701 0.485
## threshold
## fplfamily income 3x poverty 0.061 -0.048 0.170 1.105 0.272
## threshold
## fplfamily income 4x poverty 0.108 -0.009 0.224 1.830 0.070
## threshold
## fplfamily income 5x poverty 0.118 -0.007 0.243 1.867 0.065
## threshold
## fplfamily income more than 0.095 -0.027 0.216 1.545 0.125
## 5x poverty threshold
## citizenshipnot U,S, 0.169 0.054 0.283 2.911 0.004
## citizen
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.372
summ(modela, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.09
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.71 3.56 3.86 49.10
## refEDpartial college and 0.35 0.26 0.45 7.77
## below
## agemiddle-aged 0.36 0.30 0.43 10.78
## ageolder adult 0.35 0.24 0.46 6.08
## ageyoung adult 0.44 0.34 0.55 8.16
## gendermale -0.01 -0.06 0.05 -0.17
## ethnicityNon-Hispanic 0.57 0.43 0.70 8.37
## Black
## ethnicityNon-Hispanic -0.29 -0.42 -0.16 -4.45
## White
## ethnicityOther Hispanic 0.15 -0.00 0.30 1.92
## ethnicityOther or Multi -0.57 -0.73 -0.40 -6.84
## fplfamily income 2x poverty 0.03 -0.06 0.12 0.70
## threshold
## fplfamily income 3x poverty 0.06 -0.05 0.17 1.10
## threshold
## fplfamily income 4x poverty 0.11 -0.01 0.22 1.83
## threshold
## fplfamily income 5x poverty 0.12 -0.01 0.24 1.87
## threshold
## fplfamily income more than 0.09 -0.03 0.22 1.55
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.05 0.28 2.91
## citizen
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modela)
plot_summs(modela, robust = TRUE)
plot_summs(modela, inner_ci_level = .9)
# plot coefficient uncertainty as normal distributions
plot_summs(modela, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modela, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| (0.08) | |
| refEDpartial college and below | 0.35 *** |
| (0.05) | |
| agemiddle-aged | 0.36 *** |
| (0.03) | |
| ageolder adult | 0.35 *** |
| (0.06) | |
| ageyoung adult | 0.44 *** |
| (0.05) | |
| gendermale | -0.01 |
| (0.03) | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.29 *** |
| (0.06) | |
| ethnicityOther Hispanic | 0.15 |
| (0.08) | |
| ethnicityOther or Multi | -0.57 *** |
| (0.08) | |
| fplfamily income 2x poverty threshold | 0.03 |
| (0.04) | |
| fplfamily income 3x poverty threshold | 0.06 |
| (0.05) | |
| fplfamily income 4x poverty threshold | 0.11 |
| (0.06) | |
| fplfamily income 5x poverty threshold | 0.12 |
| (0.06) | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| (0.06) | |
| citizenshipnot U,S, citizen | 0.17 ** |
| (0.06) | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(modela, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| [3.56, 3.86] | |
| refEDpartial college and below | 0.35 *** |
| [0.26, 0.45] | |
| agemiddle-aged | 0.36 *** |
| [0.30, 0.43] | |
| ageolder adult | 0.35 *** |
| [0.24, 0.46] | |
| ageyoung adult | 0.44 *** |
| [0.34, 0.55] | |
| gendermale | -0.01 |
| [-0.06, 0.05] | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| [0.43, 0.70] | |
| ethnicityNon-Hispanic White | -0.29 *** |
| [-0.42, -0.16] | |
| ethnicityOther Hispanic | 0.15 |
| [-0.00, 0.30] | |
| ethnicityOther or Multi | -0.57 *** |
| [-0.73, -0.40] | |
| fplfamily income 2x poverty threshold | 0.03 |
| [-0.06, 0.12] | |
| fplfamily income 3x poverty threshold | 0.06 |
| [-0.05, 0.17] | |
| fplfamily income 4x poverty threshold | 0.11 |
| [-0.01, 0.22] | |
| fplfamily income 5x poverty threshold | 0.12 |
| [-0.01, 0.24] | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| [-0.03, 0.22] | |
| citizenshipnot U,S, citizen | 0.17 ** |
| [0.05, 0.28] | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
foresta <- plot_summs(
point.size = 3,
fontsize=8,
colors = "darkseagreen3",
modela, coefs = c("Household Education Partial College and Below
College and Beyond (ref)" = "refEDpartial college and below",
"Age: Middle-Aged
Child (ref)" = "agemiddle-aged",
"Age: Older Adult
Child (ref)" = "ageolder adult",
"Age: Young Adult
Child (ref)" = "ageyoung adult",
"Gender: Male
Gender: Female (ref)" = "gendermale",
"Ethnicity: Non-Hispanic Black
Mexican American (ref)" = "ethnicityNon-Hispanic Black",
"Ethnicity: Non-Hispanic White
Mexican American (ref)" = "ethnicityNon-Hispanic White",
"Ethnicity: Other Hispanic
Mexican American (ref)" = "ethnicityOther Hispanic",
"Ethnicity: Other or Multi
Mexican American (ref)" = "ethnicityOther or Multi",
"Family Income to Poverty Ratio: 2x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
"Family Income to Poverty Ratio: 3x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
"Family Income to Poverty Ratio: 4x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
"Family Income to Poverty Ratio: 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
"Family Income to Poverty Ratio: more than 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold",
"Citizenship Status: Not U.S. Citizen
U.S. Citizen by birth or naturalization (ref)" = "citizenshipnot U,S, citizen"),
scale = TRUE, robust = TRUE)
foresta
modelb <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
summ(modelb)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.07 51.36 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.09 0.00
## ageyoung adult 0.44 0.05 8.15 0.00
## ethnicityNon-Hispanic 0.57 0.07 8.39 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.83 0.00
## fplfamily income 2x poverty 0.03 0.04 0.69 0.49
## threshold
## fplfamily income 3x poverty 0.06 0.06 1.09 0.28
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.82 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.85 0.07
## threshold
## fplfamily income more than 0.09 0.06 1.53 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.90 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modelb, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(modelb, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.07 51.36 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.09 0.00
## ageyoung adult 0.44 0.05 8.15 0.00
## ethnicityNon-Hispanic 0.57 0.07 8.39 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.83 0.00
## fplfamily income 2x poverty 0.03 0.04 0.69 0.49
## threshold
## fplfamily income 3x poverty 0.06 0.06 1.09 0.28
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.82 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.85 0.07
## threshold
## fplfamily income more than 0.09 0.06 1.53 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.90 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modelb, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.080
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.712 3.569 3.855 51.356 0.000
## refEDpartial college and 0.355 0.264 0.445 7.765 0.000
## below
## agemiddle-aged 0.363 0.296 0.430 10.782 0.000
## ageolder adult 0.350 0.236 0.464 6.089 0.000
## ageyoung adult 0.443 0.335 0.551 8.155 0.000
## ethnicityNon-Hispanic 0.570 0.435 0.704 8.388 0.000
## Black
## ethnicityNon-Hispanic -0.287 -0.415 -0.159 -4.452 0.000
## White
## ethnicityOther Hispanic 0.146 -0.004 0.297 1.923 0.057
## ethnicityOther or Multi -0.566 -0.730 -0.402 -6.834 0.000
## fplfamily income 2x poverty 0.031 -0.057 0.118 0.693 0.490
## threshold
## fplfamily income 3x poverty 0.060 -0.049 0.170 1.094 0.277
## threshold
## fplfamily income 4x poverty 0.107 -0.009 0.224 1.822 0.071
## threshold
## fplfamily income 5x poverty 0.118 -0.008 0.244 1.854 0.066
## threshold
## fplfamily income more than 0.094 -0.028 0.216 1.533 0.128
## 5x poverty threshold
## citizenshipnot U,S, 0.168 0.053 0.283 2.899 0.005
## citizen
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.372
summ(modelb, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.71 3.57 3.86 51.36
## refEDpartial college and 0.35 0.26 0.45 7.77
## below
## agemiddle-aged 0.36 0.30 0.43 10.78
## ageolder adult 0.35 0.24 0.46 6.09
## ageyoung adult 0.44 0.34 0.55 8.15
## ethnicityNon-Hispanic 0.57 0.44 0.70 8.39
## Black
## ethnicityNon-Hispanic -0.29 -0.42 -0.16 -4.45
## White
## ethnicityOther Hispanic 0.15 -0.00 0.30 1.92
## ethnicityOther or Multi -0.57 -0.73 -0.40 -6.83
## fplfamily income 2x poverty 0.03 -0.06 0.12 0.69
## threshold
## fplfamily income 3x poverty 0.06 -0.05 0.17 1.09
## threshold
## fplfamily income 4x poverty 0.11 -0.01 0.22 1.82
## threshold
## fplfamily income 5x poverty 0.12 -0.01 0.24 1.85
## threshold
## fplfamily income more than 0.09 -0.03 0.22 1.53
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.05 0.28 2.90
## citizen
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modelb)
plot_summs(modelb, inner_ci_level = .9)
plot_summs(modelb, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(modelb, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modelb, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| (0.07) | |
| refEDpartial college and below | 0.35 *** |
| (0.05) | |
| agemiddle-aged | 0.36 *** |
| (0.03) | |
| ageolder adult | 0.35 *** |
| (0.06) | |
| ageyoung adult | 0.44 *** |
| (0.05) | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.29 *** |
| (0.06) | |
| ethnicityOther Hispanic | 0.15 |
| (0.08) | |
| ethnicityOther or Multi | -0.57 *** |
| (0.08) | |
| fplfamily income 2x poverty threshold | 0.03 |
| (0.04) | |
| fplfamily income 3x poverty threshold | 0.06 |
| (0.06) | |
| fplfamily income 4x poverty threshold | 0.11 |
| (0.06) | |
| fplfamily income 5x poverty threshold | 0.12 |
| (0.06) | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| (0.06) | |
| citizenshipnot U,S, citizen | 0.17 ** |
| (0.06) | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(modelb, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| [3.57, 3.86] | |
| refEDpartial college and below | 0.35 *** |
| [0.26, 0.45] | |
| agemiddle-aged | 0.36 *** |
| [0.30, 0.43] | |
| ageolder adult | 0.35 *** |
| [0.24, 0.46] | |
| ageyoung adult | 0.44 *** |
| [0.34, 0.55] | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| [0.44, 0.70] | |
| ethnicityNon-Hispanic White | -0.29 *** |
| [-0.42, -0.16] | |
| ethnicityOther Hispanic | 0.15 |
| [-0.00, 0.30] | |
| ethnicityOther or Multi | -0.57 *** |
| [-0.73, -0.40] | |
| fplfamily income 2x poverty threshold | 0.03 |
| [-0.06, 0.12] | |
| fplfamily income 3x poverty threshold | 0.06 |
| [-0.05, 0.17] | |
| fplfamily income 4x poverty threshold | 0.11 |
| [-0.01, 0.22] | |
| fplfamily income 5x poverty threshold | 0.12 |
| [-0.01, 0.24] | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| [-0.03, 0.22] | |
| citizenshipnot U,S, citizen | 0.17 ** |
| [0.05, 0.28] | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
forestb <- plot_summs(
point.size = 3,
fontsize=8,
colors = "darkslateblue",
modela, coefs = c("Household Education Partial College and Below
College and Beyond (ref)" = "refEDpartial college and below",
"Age: Middle-Aged
Child (ref)" = "agemiddle-aged",
"Age: Older Adult
Child (ref)" = "ageolder adult",
"Age: Young Adult
Child (ref)" = "ageyoung adult",
"Ethnicity: Non-Hispanic Black
Mexican American (ref)" = "ethnicityNon-Hispanic Black",
"Ethnicity: Non-Hispanic White
Mexican American (ref)" = "ethnicityNon-Hispanic White",
"Ethnicity: Other Hispanic
Mexican American (ref)" = "ethnicityOther Hispanic",
"Ethnicity: Other or Multi
Mexican American (ref)" = "ethnicityOther or Multi",
"Family Income to Poverty Ratio: 2x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
"Family Income to Poverty Ratio: 3x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
"Family Income to Poverty Ratio: 4x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
"Family Income to Poverty Ratio: 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
"Family Income to Poverty Ratio: more than 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold",
"Citizenship Status: Not U.S. Citizen
U.S. Citizen by birth or naturalization (ref)" = "citizenshipnot U,S, citizen"),
scale = TRUE, robust = TRUE)
forestb
modelc <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)
summ(modelc)
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.76 0.07 55.26 0.00
## refEDpartial college and 0.35 0.05 7.68 0.00
## below
## agemiddle-aged 0.37 0.03 11.28 0.00
## ageolder adult 0.36 0.06 6.16 0.00
## ageyoung adult 0.46 0.05 8.42 0.00
## ethnicityNon-Hispanic 0.52 0.07 7.97 0.00
## Black
## ethnicityNon-Hispanic -0.33 0.06 -5.45 0.00
## White
## ethnicityOther Hispanic 0.13 0.07 1.79 0.08
## ethnicityOther or Multi -0.58 0.08 -7.17 0.00
## fplfamily income 2x poverty 0.03 0.04 0.63 0.53
## threshold
## fplfamily income 3x poverty 0.06 0.05 1.02 0.31
## threshold
## fplfamily income 4x poverty 0.10 0.06 1.69 0.09
## threshold
## fplfamily income 5x poverty 0.11 0.06 1.72 0.09
## threshold
## fplfamily income more than 0.08 0.06 1.38 0.17
## 5x poverty threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modelc, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(modelc, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.76 0.07 55.26 0.00
## refEDpartial college and 0.35 0.05 7.68 0.00
## below
## agemiddle-aged 0.37 0.03 11.28 0.00
## ageolder adult 0.36 0.06 6.16 0.00
## ageyoung adult 0.46 0.05 8.42 0.00
## ethnicityNon-Hispanic 0.52 0.07 7.97 0.00
## Black
## ethnicityNon-Hispanic -0.33 0.06 -5.45 0.00
## White
## ethnicityOther Hispanic 0.13 0.07 1.79 0.08
## ethnicityOther or Multi -0.58 0.08 -7.17 0.00
## fplfamily income 2x poverty 0.03 0.04 0.63 0.53
## threshold
## fplfamily income 3x poverty 0.06 0.05 1.02 0.31
## threshold
## fplfamily income 4x poverty 0.10 0.06 1.69 0.09
## threshold
## fplfamily income 5x poverty 0.11 0.06 1.72 0.09
## threshold
## fplfamily income more than 0.08 0.06 1.38 0.17
## 5x poverty threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(modelc, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.059
## Adj. R² = -0.071
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.763 3.628 3.898 55.256 0.000
## refEDpartial college and 0.351 0.260 0.441 7.680 0.000
## below
## agemiddle-aged 0.375 0.309 0.440 11.283 0.000
## ageolder adult 0.356 0.242 0.471 6.158 0.000
## ageyoung adult 0.455 0.348 0.562 8.415 0.000
## ethnicityNon-Hispanic 0.524 0.394 0.654 7.975 0.000
## Black
## ethnicityNon-Hispanic -0.335 -0.456 -0.213 -5.455 0.000
## White
## ethnicityOther Hispanic 0.134 -0.014 0.283 1.795 0.075
## ethnicityOther or Multi -0.582 -0.742 -0.421 -7.175 0.000
## fplfamily income 2x poverty 0.028 -0.060 0.115 0.626 0.532
## threshold
## fplfamily income 3x poverty 0.056 -0.052 0.164 1.024 0.308
## threshold
## fplfamily income 4x poverty 0.098 -0.017 0.213 1.687 0.094
## threshold
## fplfamily income 5x poverty 0.108 -0.017 0.232 1.717 0.089
## threshold
## fplfamily income more than 0.084 -0.037 0.205 1.379 0.171
## 5x poverty threshold
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.374
summ(modelc, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19235
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.07
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.76 3.63 3.90 55.26
## refEDpartial college and 0.35 0.26 0.44 7.68
## below
## agemiddle-aged 0.37 0.31 0.44 11.28
## ageolder adult 0.36 0.24 0.47 6.16
## ageyoung adult 0.46 0.35 0.56 8.42
## ethnicityNon-Hispanic 0.52 0.39 0.65 7.97
## Black
## ethnicityNon-Hispanic -0.33 -0.46 -0.21 -5.45
## White
## ethnicityOther Hispanic 0.13 -0.01 0.28 1.79
## ethnicityOther or Multi -0.58 -0.74 -0.42 -7.17
## fplfamily income 2x poverty 0.03 -0.06 0.11 0.63
## threshold
## fplfamily income 3x poverty 0.06 -0.05 0.16 1.02
## threshold
## fplfamily income 4x poverty 0.10 -0.02 0.21 1.69
## threshold
## fplfamily income 5x poverty 0.11 -0.02 0.23 1.72
## threshold
## fplfamily income more than 0.08 -0.04 0.20 1.38
## 5x poverty threshold
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(modelc)
plot_summs(modelc, inner_ci_level = .9)
plot_summs(modelc, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(modelc, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(modelc, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.76 *** |
| (0.07) | |
| refEDpartial college and below | 0.35 *** |
| (0.05) | |
| agemiddle-aged | 0.37 *** |
| (0.03) | |
| ageolder adult | 0.36 *** |
| (0.06) | |
| ageyoung adult | 0.46 *** |
| (0.05) | |
| ethnicityNon-Hispanic Black | 0.52 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.33 *** |
| (0.06) | |
| ethnicityOther Hispanic | 0.13 |
| (0.07) | |
| ethnicityOther or Multi | -0.58 *** |
| (0.08) | |
| fplfamily income 2x poverty threshold | 0.03 |
| (0.04) | |
| fplfamily income 3x poverty threshold | 0.06 |
| (0.05) | |
| fplfamily income 4x poverty threshold | 0.10 |
| (0.06) | |
| fplfamily income 5x poverty threshold | 0.11 |
| (0.06) | |
| fplfamily income more than 5x poverty threshold | 0.08 |
| (0.06) | |
| N | 19235 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(modelc, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.76 *** |
| [3.63, 3.90] | |
| refEDpartial college and below | 0.35 *** |
| [0.26, 0.44] | |
| agemiddle-aged | 0.37 *** |
| [0.31, 0.44] | |
| ageolder adult | 0.36 *** |
| [0.24, 0.47] | |
| ageyoung adult | 0.46 *** |
| [0.35, 0.56] | |
| ethnicityNon-Hispanic Black | 0.52 *** |
| [0.39, 0.65] | |
| ethnicityNon-Hispanic White | -0.33 *** |
| [-0.46, -0.21] | |
| ethnicityOther Hispanic | 0.13 |
| [-0.01, 0.28] | |
| ethnicityOther or Multi | -0.58 *** |
| [-0.74, -0.42] | |
| fplfamily income 2x poverty threshold | 0.03 |
| [-0.06, 0.11] | |
| fplfamily income 3x poverty threshold | 0.06 |
| [-0.05, 0.16] | |
| fplfamily income 4x poverty threshold | 0.10 |
| [-0.02, 0.21] | |
| fplfamily income 5x poverty threshold | 0.11 |
| [-0.02, 0.23] | |
| fplfamily income more than 5x poverty threshold | 0.08 |
| [-0.04, 0.20] | |
| N | 19235 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
forestc <- plot_summs(
point.size = 3,
fontsize=8,
colors = "deepskyblue4",
modela, coefs = c("Household Education Partial College and Below
College and Beyond (ref)" = "refEDpartial college and below",
"Age: Middle-Aged
Child (ref)" = "agemiddle-aged",
"Age: Older Adult
Child (ref)" = "ageolder adult",
"Age: Young Adult
Child (ref)" = "ageyoung adult",
"Ethnicity: Non-Hispanic Black
Mexican American (ref)" = "ethnicityNon-Hispanic Black",
"Ethnicity: Non-Hispanic White
Mexican American (ref)" = "ethnicityNon-Hispanic White",
"Ethnicity: Other Hispanic
Mexican American (ref)" = "ethnicityOther Hispanic",
"Ethnicity: Other or Multi
Mexican American (ref)" = "ethnicityOther or Multi",
"Family Income to Poverty Ratio: 2x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 2x poverty threshold",
"Family Income to Poverty Ratio: 3x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 3x poverty threshold",
"Family Income to Poverty Ratio: 4x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 4x poverty threshold",
"Family Income to Poverty Ratio: 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income 5x poverty threshold",
"Family Income to Poverty Ratio: more than 5x Poverty threshold
At poverty threshold (ref)" = "fplfamily income more than 5x poverty threshold"),
scale = TRUE, robust = TRUE)
forestc
subset_adult <- subset(nhc, RIDAGEYR > 19)
model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit)
summ(model_adult)
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13
##
## Standard errors: Robust
## ---------------------------------------------------------------
## Est. S.E. t val. p
## -------------------------------- ------- ------ -------- ------
## (Intercept) 4.44 0.11 39.26 0.00
## refEDpartial college and 0.14 0.07 2.08 0.04
## below
## ageolder adult -0.02 0.05 -0.36 0.72
## ageyoung adult 0.08 0.06 1.33 0.19
## gendermale 0.03 0.04 0.94 0.35
## ethnicityNon-Hispanic 0.48 0.07 6.61 0.00
## Black
## ethnicityNon-Hispanic -0.38 0.07 -5.48 0.00
## White
## ethnicityOther Hispanic 0.06 0.08 0.71 0.48
## ethnicityOther or Multi -0.70 0.09 -7.60 0.00
## fplfamily income 2x poverty 0.07 0.05 1.35 0.18
## threshold
## fplfamily income 3x poverty 0.09 0.07 1.30 0.20
## threshold
## fplfamily income 4x poverty 0.15 0.07 2.10 0.04
## threshold
## fplfamily income 5x poverty 0.16 0.08 2.07 0.04
## threshold
## fplfamily income more than 0.18 0.07 2.52 0.01
## 5x poverty threshold
## citizenshipnot U,S, 0.15 0.07 2.17 0.03
## citizen
## adultEDcollege grad or -0.39 0.08 -5.25 0.00
## above
## adultEDhigh school -0.06 0.06 -1.11 0.27
## grad/GED
## adultEDless than 9th grade -0.19 0.07 -2.48 0.01
## adultEDsome college or AA -0.16 0.07 -2.36 0.02
## ---------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
summ(model_adult, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_adult, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13
##
## Standard errors: Robust
## ---------------------------------------------------------------
## Est. S.E. t val. p
## -------------------------------- ------- ------ -------- ------
## (Intercept) 4.44 0.11 39.26 0.00
## refEDpartial college and 0.14 0.07 2.08 0.04
## below
## ageolder adult -0.02 0.05 -0.36 0.72
## ageyoung adult 0.08 0.06 1.33 0.19
## gendermale 0.03 0.04 0.94 0.35
## ethnicityNon-Hispanic 0.48 0.07 6.61 0.00
## Black
## ethnicityNon-Hispanic -0.38 0.07 -5.48 0.00
## White
## ethnicityOther Hispanic 0.06 0.08 0.71 0.48
## ethnicityOther or Multi -0.70 0.09 -7.60 0.00
## fplfamily income 2x poverty 0.07 0.05 1.35 0.18
## threshold
## fplfamily income 3x poverty 0.09 0.07 1.30 0.20
## threshold
## fplfamily income 4x poverty 0.15 0.07 2.10 0.04
## threshold
## fplfamily income 5x poverty 0.16 0.08 2.07 0.04
## threshold
## fplfamily income more than 0.18 0.07 2.52 0.01
## 5x poverty threshold
## citizenshipnot U,S, 0.15 0.07 2.17 0.03
## citizen
## adultEDcollege grad or -0.39 0.08 -5.25 0.00
## above
## adultEDhigh school -0.06 0.06 -1.11 0.27
## grad/GED
## adultEDless than 9th grade -0.19 0.07 -2.48 0.01
## adultEDsome college or AA -0.16 0.07 -2.36 0.02
## ---------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
summ(model_adult, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.057
## Adj. R² = -0.128
##
## Standard errors: Robust
## ----------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## -------------------------------- -------- -------- -------- -------- -------
## (Intercept) 4.444 4.219 4.668 39.259 0.000
## refEDpartial college and 0.141 0.007 0.276 2.081 0.040
## below
## ageolder adult -0.020 -0.128 0.089 -0.359 0.720
## ageyoung adult 0.077 -0.038 0.193 1.334 0.185
## gendermale 0.034 -0.038 0.106 0.937 0.351
## ethnicityNon-Hispanic 0.481 0.337 0.626 6.610 0.000
## Black
## ethnicityNon-Hispanic -0.379 -0.516 -0.242 -5.475 0.000
## White
## ethnicityOther Hispanic 0.060 -0.108 0.228 0.707 0.481
## ethnicityOther or Multi -0.702 -0.885 -0.519 -7.598 0.000
## fplfamily income 2x poverty 0.073 -0.034 0.180 1.353 0.179
## threshold
## fplfamily income 3x poverty 0.087 -0.045 0.220 1.303 0.195
## threshold
## fplfamily income 4x poverty 0.146 0.008 0.284 2.103 0.038
## threshold
## fplfamily income 5x poverty 0.158 0.007 0.308 2.075 0.040
## threshold
## fplfamily income more than 0.178 0.038 0.318 2.520 0.013
## 5x poverty threshold
## citizenshipnot U,S, 0.149 0.013 0.286 2.171 0.032
## citizen
## adultEDcollege grad or -0.394 -0.542 -0.245 -5.247 0.000
## above
## adultEDhigh school -0.063 -0.176 0.049 -1.115 0.267
## grad/GED
## adultEDless than 9th grade -0.185 -0.333 -0.037 -2.477 0.015
## adultEDsome college or AA -0.157 -0.288 -0.025 -2.357 0.020
## ----------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.488
summ(model_adult, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.13
##
## Standard errors: Robust
## -----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## -------------------------------- ------- ------- ------- --------
## (Intercept) 4.44 4.22 4.67 39.26
## refEDpartial college and 0.14 0.01 0.28 2.08
## below
## ageolder adult -0.02 -0.13 0.09 -0.36
## ageyoung adult 0.08 -0.04 0.19 1.33
## gendermale 0.03 -0.04 0.11 0.94
## ethnicityNon-Hispanic 0.48 0.34 0.63 6.61
## Black
## ethnicityNon-Hispanic -0.38 -0.52 -0.24 -5.48
## White
## ethnicityOther Hispanic 0.06 -0.11 0.23 0.71
## ethnicityOther or Multi -0.70 -0.89 -0.52 -7.60
## fplfamily income 2x poverty 0.07 -0.03 0.18 1.35
## threshold
## fplfamily income 3x poverty 0.09 -0.05 0.22 1.30
## threshold
## fplfamily income 4x poverty 0.15 0.01 0.28 2.10
## threshold
## fplfamily income 5x poverty 0.16 0.01 0.31 2.07
## threshold
## fplfamily income more than 0.18 0.04 0.32 2.52
## 5x poverty threshold
## citizenshipnot U,S, 0.15 0.01 0.29 2.17
## citizen
## adultEDcollege grad or -0.39 -0.54 -0.24 -5.25
## above
## adultEDhigh school -0.06 -0.18 0.05 -1.11
## grad/GED
## adultEDless than 9th grade -0.19 -0.33 -0.04 -2.48
## adultEDsome college or AA -0.16 -0.29 -0.02 -2.36
## -----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
# THE GRAPH
plot_summs(model_adult)
plot_summs(model_adult, inner_ci_level = .9)
plot_summs(model_adult, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_adult, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_adult, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 4.44 *** |
| (0.11) | |
| refEDpartial college and below | 0.14 * |
| (0.07) | |
| ageolder adult | -0.02 |
| (0.05) | |
| ageyoung adult | 0.08 |
| (0.06) | |
| gendermale | 0.03 |
| (0.04) | |
| ethnicityNon-Hispanic Black | 0.48 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.38 *** |
| (0.07) | |
| ethnicityOther Hispanic | 0.06 |
| (0.08) | |
| ethnicityOther or Multi | -0.70 *** |
| (0.09) | |
| fplfamily income 2x poverty threshold | 0.07 |
| (0.05) | |
| fplfamily income 3x poverty threshold | 0.09 |
| (0.07) | |
| fplfamily income 4x poverty threshold | 0.15 * |
| (0.07) | |
| fplfamily income 5x poverty threshold | 0.16 * |
| (0.08) | |
| fplfamily income more than 5x poverty threshold | 0.18 * |
| (0.07) | |
| citizenshipnot U,S, citizen | 0.15 * |
| (0.07) | |
| adultEDcollege grad or above | -0.39 *** |
| (0.08) | |
| adultEDhigh school grad/GED | -0.06 |
| (0.06) | |
| adultEDless than 9th grade | -0.19 * |
| (0.07) | |
| adultEDsome college or AA | -0.16 * |
| (0.07) | |
| N | 12132 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_adult, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 4.44 *** |
| [4.22, 4.67] | |
| refEDpartial college and below | 0.14 * |
| [0.01, 0.28] | |
| ageolder adult | -0.02 |
| [-0.13, 0.09] | |
| ageyoung adult | 0.08 |
| [-0.04, 0.19] | |
| gendermale | 0.03 |
| [-0.04, 0.11] | |
| ethnicityNon-Hispanic Black | 0.48 *** |
| [0.34, 0.63] | |
| ethnicityNon-Hispanic White | -0.38 *** |
| [-0.52, -0.24] | |
| ethnicityOther Hispanic | 0.06 |
| [-0.11, 0.23] | |
| ethnicityOther or Multi | -0.70 *** |
| [-0.89, -0.52] | |
| fplfamily income 2x poverty threshold | 0.07 |
| [-0.03, 0.18] | |
| fplfamily income 3x poverty threshold | 0.09 |
| [-0.05, 0.22] | |
| fplfamily income 4x poverty threshold | 0.15 * |
| [0.01, 0.28] | |
| fplfamily income 5x poverty threshold | 0.16 * |
| [0.01, 0.31] | |
| fplfamily income more than 5x poverty threshold | 0.18 * |
| [0.04, 0.32] | |
| citizenshipnot U,S, citizen | 0.15 * |
| [0.01, 0.29] | |
| adultEDcollege grad or above | -0.39 *** |
| [-0.54, -0.24] | |
| adultEDhigh school grad/GED | -0.06 |
| [-0.18, 0.05] | |
| adultEDless than 9th grade | -0.19 * |
| [-0.33, -0.04] | |
| adultEDsome college or AA | -0.16 * |
| [-0.29, -0.02] | |
| N | 12132 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# check AIC of Model E and for interaction
subset_adult <- subset(nhc, RIDAGEYR > 19)
model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit)
ols_adult <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+adultED, design=subset_adult, na.action = na.omit))
ols_adult
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR > 19)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship + adultED, design = subset_adult, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 4.44356
## refEDpartial college and below
## 0.14119
## ageolder adult
## -0.01961
## ageyoung adult
## 0.07748
## gendermale
## 0.03390
## ethnicityNon-Hispanic Black
## 0.48139
## ethnicityNon-Hispanic White
## -0.37902
## ethnicityOther Hispanic
## 0.05996
## ethnicityOther or Multi
## -0.70217
## fplfamily income 2x poverty threshold
## 0.07295
## fplfamily income 3x poverty threshold
## 0.08713
## fplfamily income 4x poverty threshold
## 0.14640
## fplfamily income 5x poverty threshold
## 0.15751
## fplfamily income more than 5x poverty threshold
## 0.17778
## citizenshipnot U,S, citizen
## 0.14940
## adultEDcollege grad or above
## -0.39358
## adultEDhigh school grad/GED
## -0.06346
## adultEDless than 9th grade
## -0.18504
## adultEDsome college or AA
## -0.15667
##
## Degrees of Freedom: 12131 Total (i.e. Null); 106 Residual
## (1965 observations deleted due to missingness)
## Null Deviance: 32000
## Residual Deviance: 30180 AIC: 49030
# this gives an AIC of 49,030
# with gender, AIC is the same
# checking adultED*fpl
model_adult_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit)
ols_adult_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit))
ols_adult_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR > 19)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl * adultED + citizenship, design = subset_adult, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 4.36122
## refEDpartial college and below
## 0.13774
## ageolder adult
## -0.01526
## ageyoung adult
## 0.06749
## gendermale
## 0.03305
## ethnicityNon-Hispanic Black
## 0.47803
## ethnicityNon-Hispanic White
## -0.38392
## ethnicityOther Hispanic
## 0.05513
## ethnicityOther or Multi
## -0.71591
## fplfamily income 2x poverty threshold
## 0.22130
## fplfamily income 3x poverty threshold
## 0.24930
## fplfamily income 4x poverty threshold
## 0.17751
## fplfamily income 5x poverty threshold
## 0.58635
## fplfamily income more than 5x poverty threshold
## 0.06208
## adultEDcollege grad or above
## -0.10661
## adultEDhigh school grad/GED
## 0.07886
## adultEDless than 9th grade
## -0.14456
## adultEDsome college or AA
## -0.05738
## citizenshipnot U,S, citizen
## 0.15350
## fplfamily income 2x poverty threshold:adultEDcollege grad or above
## -0.38960
## fplfamily income 3x poverty threshold:adultEDcollege grad or above
## -0.33654
## fplfamily income 4x poverty threshold:adultEDcollege grad or above
## -0.10297
## fplfamily income 5x poverty threshold:adultEDcollege grad or above
## -0.65937
## fplfamily income more than 5x poverty threshold:adultEDcollege grad or above
## -0.11581
## fplfamily income 2x poverty threshold:adultEDhigh school grad/GED
## -0.29087
## fplfamily income 3x poverty threshold:adultEDhigh school grad/GED
## -0.23135
## fplfamily income 4x poverty threshold:adultEDhigh school grad/GED
## -0.06144
## fplfamily income 5x poverty threshold:adultEDhigh school grad/GED
## -0.59091
## fplfamily income more than 5x poverty threshold:adultEDhigh school grad/GED
## 0.24831
## fplfamily income 2x poverty threshold:adultEDless than 9th grade
## -0.08852
## fplfamily income 3x poverty threshold:adultEDless than 9th grade
## -0.12357
## fplfamily income 4x poverty threshold:adultEDless than 9th grade
## 0.13382
## fplfamily income 5x poverty threshold:adultEDless than 9th grade
## 0.15090
## fplfamily income more than 5x poverty threshold:adultEDless than 9th grade
## -0.52742
## fplfamily income 2x poverty threshold:adultEDsome college or AA
## -0.09355
## fplfamily income 3x poverty threshold:adultEDsome college or AA
## -0.18532
## fplfamily income 4x poverty threshold:adultEDsome college or AA
## -0.13516
## fplfamily income 5x poverty threshold:adultEDsome college or AA
## -0.42478
## fplfamily income more than 5x poverty threshold:adultEDsome college or AA
## 0.12183
##
## Degrees of Freedom: 12131 Total (i.e. Null); 86 Residual
## (1965 observations deleted due to missingness)
## Null Deviance: 32000
## Residual Deviance: 30100 AIC: 49040
# this gives an AIC of 49,040
subset_child <- subset(nhc, RIDAGEYR <= 19)
# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)
# "model M" in document
model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl*childED+citizenship, design=subset_child, na.action = na.omit)
summ(model_child)
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.74 0.09 39.62 0.00
## refEDpartial college and 0.32 0.07 4.83 0.00
## below
## ageyoung adult -0.11 0.10 -1.08 0.28
## gendermale -0.18 0.04 -4.18 0.00
## ethnicityNon-Hispanic 0.67 0.07 9.30 0.00
## Black
## ethnicityNon-Hispanic -0.24 0.07 -3.28 0.00
## White
## ethnicityOther Hispanic 0.29 0.12 2.36 0.02
## ethnicityOther or Multi -0.24 0.11 -2.24 0.03
## childEDsecondary 0.45 0.12 3.76 0.00
## fplfamily income 2x poverty -0.07 0.06 -1.12 0.26
## threshold
## fplfamily income 3x poverty -0.02 0.08 -0.26 0.80
## threshold
## fplfamily income 4x poverty 0.10 0.09 1.17 0.24
## threshold
## fplfamily income 5x poverty 0.04 0.11 0.39 0.70
## threshold
## fplfamily income more than -0.19 0.10 -1.82 0.07
## 5x poverty threshold
## citizenshipnot U,S, 0.14 0.09 1.46 0.15
## citizen
## ethnicityNon-Hispanic -0.06 0.13 -0.46 0.65
## Black:childEDsecondary
## ethnicityNon-Hispanic 0.18 0.13 1.39 0.17
## White:childEDsecondary
## ethnicityOther -0.07 0.20 -0.37 0.71
## Hispanic:childEDsecondary
## ethnicityOther or 0.01 0.17 0.07 0.95
## Multi:childEDsecondary
## childEDsecondary:fplfamily 0.00 0.12 0.01 1.00
## income 2x poverty threshold
## childEDsecondary:fplfamily 0.10 0.14 0.69 0.49
## income 3x poverty threshold
## childEDsecondary:fplfamily -0.07 0.17 -0.41 0.68
## income 4x poverty threshold
## childEDsecondary:fplfamily 0.21 0.21 1.00 0.32
## income 5x poverty threshold
## childEDsecondary:fplfamily 0.30 0.17 1.79 0.08
## income more than 5x poverty
## threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
summ(model_child, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_child, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.74 0.09 39.62 0.00
## refEDpartial college and 0.32 0.07 4.83 0.00
## below
## ageyoung adult -0.11 0.10 -1.08 0.28
## gendermale -0.18 0.04 -4.18 0.00
## ethnicityNon-Hispanic 0.67 0.07 9.30 0.00
## Black
## ethnicityNon-Hispanic -0.24 0.07 -3.28 0.00
## White
## ethnicityOther Hispanic 0.29 0.12 2.36 0.02
## ethnicityOther or Multi -0.24 0.11 -2.24 0.03
## childEDsecondary 0.45 0.12 3.76 0.00
## fplfamily income 2x poverty -0.07 0.06 -1.12 0.26
## threshold
## fplfamily income 3x poverty -0.02 0.08 -0.26 0.80
## threshold
## fplfamily income 4x poverty 0.10 0.09 1.17 0.24
## threshold
## fplfamily income 5x poverty 0.04 0.11 0.39 0.70
## threshold
## fplfamily income more than -0.19 0.10 -1.82 0.07
## 5x poverty threshold
## citizenshipnot U,S, 0.14 0.09 1.46 0.15
## citizen
## ethnicityNon-Hispanic -0.06 0.13 -0.46 0.65
## Black:childEDsecondary
## ethnicityNon-Hispanic 0.18 0.13 1.39 0.17
## White:childEDsecondary
## ethnicityOther -0.07 0.20 -0.37 0.71
## Hispanic:childEDsecondary
## ethnicityOther or 0.01 0.17 0.07 0.95
## Multi:childEDsecondary
## childEDsecondary:fplfamily 0.00 0.12 0.01 1.00
## income 2x poverty threshold
## childEDsecondary:fplfamily 0.10 0.14 0.69 0.49
## income 3x poverty threshold
## childEDsecondary:fplfamily -0.07 0.17 -0.41 0.68
## income 4x poverty threshold
## childEDsecondary:fplfamily 0.21 0.21 1.00 0.32
## income 5x poverty threshold
## childEDsecondary:fplfamily 0.30 0.17 1.79 0.08
## income more than 5x poverty
## threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
summ(model_child, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.109
## Adj. R² = -0.111
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.736 3.549 3.923 39.624 0.000
## refEDpartial college and 0.322 0.189 0.454 4.827 0.000
## below
## ageyoung adult -0.109 -0.310 0.092 -1.080 0.283
## gendermale -0.185 -0.273 -0.097 -4.182 0.000
## ethnicityNon-Hispanic 0.671 0.528 0.815 9.297 0.000
## Black
## ethnicityNon-Hispanic -0.237 -0.380 -0.094 -3.283 0.001
## White
## ethnicityOther Hispanic 0.289 0.046 0.532 2.363 0.020
## ethnicityOther or Multi -0.245 -0.461 -0.028 -2.239 0.027
## childEDsecondary 0.454 0.214 0.693 3.761 0.000
## fplfamily income 2x poverty -0.065 -0.181 0.050 -1.124 0.264
## threshold
## fplfamily income 3x poverty -0.022 -0.188 0.145 -0.259 0.796
## threshold
## fplfamily income 4x poverty 0.100 -0.069 0.269 1.170 0.245
## threshold
## fplfamily income 5x poverty 0.044 -0.181 0.269 0.387 0.700
## threshold
## fplfamily income more than -0.188 -0.393 0.017 -1.823 0.071
## 5x poverty threshold
## citizenshipnot U,S, 0.138 -0.049 0.325 1.461 0.147
## citizen
## ethnicityNon-Hispanic -0.058 -0.310 0.193 -0.460 0.646
## Black:childEDsecondary
## ethnicityNon-Hispanic 0.175 -0.076 0.426 1.385 0.169
## White:childEDsecondary
## ethnicityOther -0.072 -0.462 0.317 -0.369 0.713
## Hispanic:childEDsecondary
## ethnicityOther or 0.011 -0.323 0.346 0.068 0.946
## Multi:childEDsecondary
## childEDsecondary:fplfamily 0.001 -0.245 0.247 0.006 0.995
## income 2x poverty threshold
## childEDsecondary:fplfamily 0.099 -0.186 0.385 0.689 0.492
## income 3x poverty threshold
## childEDsecondary:fplfamily -0.068 -0.397 0.261 -0.408 0.684
## income 4x poverty threshold
## childEDsecondary:fplfamily 0.208 -0.205 0.621 0.999 0.320
## income 5x poverty threshold
## childEDsecondary:fplfamily 0.297 -0.032 0.625 1.791 0.076
## income more than 5x poverty
## threshold
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 1.826
summ(model_child, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.11
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.74 3.55 3.92 39.62
## refEDpartial college and 0.32 0.19 0.45 4.83
## below
## ageyoung adult -0.11 -0.31 0.09 -1.08
## gendermale -0.18 -0.27 -0.10 -4.18
## ethnicityNon-Hispanic 0.67 0.53 0.81 9.30
## Black
## ethnicityNon-Hispanic -0.24 -0.38 -0.09 -3.28
## White
## ethnicityOther Hispanic 0.29 0.05 0.53 2.36
## ethnicityOther or Multi -0.24 -0.46 -0.03 -2.24
## childEDsecondary 0.45 0.21 0.69 3.76
## fplfamily income 2x poverty -0.07 -0.18 0.05 -1.12
## threshold
## fplfamily income 3x poverty -0.02 -0.19 0.14 -0.26
## threshold
## fplfamily income 4x poverty 0.10 -0.07 0.27 1.17
## threshold
## fplfamily income 5x poverty 0.04 -0.18 0.27 0.39
## threshold
## fplfamily income more than -0.19 -0.39 0.02 -1.82
## 5x poverty threshold
## citizenshipnot U,S, 0.14 -0.05 0.33 1.46
## citizen
## ethnicityNon-Hispanic -0.06 -0.31 0.19 -0.46
## Black:childEDsecondary
## ethnicityNon-Hispanic 0.18 -0.08 0.43 1.39
## White:childEDsecondary
## ethnicityOther -0.07 -0.46 0.32 -0.37
## Hispanic:childEDsecondary
## ethnicityOther or 0.01 -0.32 0.35 0.07
## Multi:childEDsecondary
## childEDsecondary:fplfamily 0.00 -0.25 0.25 0.01
## income 2x poverty threshold
## childEDsecondary:fplfamily 0.10 -0.19 0.38 0.69
## income 3x poverty threshold
## childEDsecondary:fplfamily -0.07 -0.40 0.26 -0.41
## income 4x poverty threshold
## childEDsecondary:fplfamily 0.21 -0.20 0.62 1.00
## income 5x poverty threshold
## childEDsecondary:fplfamily 0.30 -0.03 0.63 1.79
## income more than 5x poverty
## threshold
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
# THE GRAPH
plot_summs(model_child)
plot_summs(model_child, inner_ci_level = .9)
plot_summs(model_child, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_child, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_child, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.74 *** |
| (0.09) | |
| refEDpartial college and below | 0.32 *** |
| (0.07) | |
| ageyoung adult | -0.11 |
| (0.10) | |
| gendermale | -0.18 *** |
| (0.04) | |
| ethnicityNon-Hispanic Black | 0.67 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.24 ** |
| (0.07) | |
| ethnicityOther Hispanic | 0.29 * |
| (0.12) | |
| ethnicityOther or Multi | -0.24 * |
| (0.11) | |
| childEDsecondary | 0.45 *** |
| (0.12) | |
| fplfamily income 2x poverty threshold | -0.07 |
| (0.06) | |
| fplfamily income 3x poverty threshold | -0.02 |
| (0.08) | |
| fplfamily income 4x poverty threshold | 0.10 |
| (0.09) | |
| fplfamily income 5x poverty threshold | 0.04 |
| (0.11) | |
| fplfamily income more than 5x poverty threshold | -0.19 |
| (0.10) | |
| citizenshipnot U,S, citizen | 0.14 |
| (0.09) | |
| ethnicityNon-Hispanic Black:childEDsecondary | -0.06 |
| (0.13) | |
| ethnicityNon-Hispanic White:childEDsecondary | 0.18 |
| (0.13) | |
| ethnicityOther Hispanic:childEDsecondary | -0.07 |
| (0.20) | |
| ethnicityOther or Multi:childEDsecondary | 0.01 |
| (0.17) | |
| childEDsecondary:fplfamily income 2x poverty threshold | 0.00 |
| (0.12) | |
| childEDsecondary:fplfamily income 3x poverty threshold | 0.10 |
| (0.14) | |
| childEDsecondary:fplfamily income 4x poverty threshold | -0.07 |
| (0.17) | |
| childEDsecondary:fplfamily income 5x poverty threshold | 0.21 |
| (0.21) | |
| childEDsecondary:fplfamily income more than 5x poverty threshold | 0.30 |
| (0.17) | |
| N | 6619 |
| R2 | 0.11 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_child, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.74 *** |
| [3.55, 3.92] | |
| refEDpartial college and below | 0.32 *** |
| [0.19, 0.45] | |
| ageyoung adult | -0.11 |
| [-0.31, 0.09] | |
| gendermale | -0.18 *** |
| [-0.27, -0.10] | |
| ethnicityNon-Hispanic Black | 0.67 *** |
| [0.53, 0.81] | |
| ethnicityNon-Hispanic White | -0.24 ** |
| [-0.38, -0.09] | |
| ethnicityOther Hispanic | 0.29 * |
| [0.05, 0.53] | |
| ethnicityOther or Multi | -0.24 * |
| [-0.46, -0.03] | |
| childEDsecondary | 0.45 *** |
| [0.21, 0.69] | |
| fplfamily income 2x poverty threshold | -0.07 |
| [-0.18, 0.05] | |
| fplfamily income 3x poverty threshold | -0.02 |
| [-0.19, 0.14] | |
| fplfamily income 4x poverty threshold | 0.10 |
| [-0.07, 0.27] | |
| fplfamily income 5x poverty threshold | 0.04 |
| [-0.18, 0.27] | |
| fplfamily income more than 5x poverty threshold | -0.19 |
| [-0.39, 0.02] | |
| citizenshipnot U,S, citizen | 0.14 |
| [-0.05, 0.33] | |
| ethnicityNon-Hispanic Black:childEDsecondary | -0.06 |
| [-0.31, 0.19] | |
| ethnicityNon-Hispanic White:childEDsecondary | 0.18 |
| [-0.08, 0.43] | |
| ethnicityOther Hispanic:childEDsecondary | -0.07 |
| [-0.46, 0.32] | |
| ethnicityOther or Multi:childEDsecondary | 0.01 |
| [-0.32, 0.35] | |
| childEDsecondary:fplfamily income 2x poverty threshold | 0.00 |
| [-0.25, 0.25] | |
| childEDsecondary:fplfamily income 3x poverty threshold | 0.10 |
| [-0.19, 0.38] | |
| childEDsecondary:fplfamily income 4x poverty threshold | -0.07 |
| [-0.40, 0.26] | |
| childEDsecondary:fplfamily income 5x poverty threshold | 0.21 |
| [-0.20, 0.62] | |
| childEDsecondary:fplfamily income more than 5x poverty threshold | 0.30 |
| [-0.03, 0.63] | |
| N | 6619 |
| R2 | 0.11 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
### check AIC of Model D for interaction
subset_child <- subset(nhc, RIDAGEYR <= 19)
model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)
ols_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=subset_child, na.action = na.omit)
ols_child
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## subset(nhc, RIDAGEYR <= 19)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship + childED, design = subset_child, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.68377
## refEDpartial college and below
## 0.32922
## ageyoung adult
## -0.12090
## gendermale
## -0.18140
## ethnicityNon-Hispanic Black
## 0.64594
## ethnicityNon-Hispanic White
## -0.18883
## ethnicityOther Hispanic
## 0.26191
## ethnicityOther or Multi
## -0.24214
## fplfamily income 2x poverty threshold
## -0.06454
## fplfamily income 3x poverty threshold
## 0.00301
## fplfamily income 4x poverty threshold
## 0.07541
## fplfamily income 5x poverty threshold
## 0.10752
## fplfamily income more than 5x poverty threshold
## -0.08216
## citizenshipnot U,S, citizen
## 0.12371
## childEDsecondary
## 0.62523
##
## Degrees of Freedom: 6618 Total (i.e. Null); 110 Residual
## (1633 observations deleted due to missingness)
## Null Deviance: 13560
## Residual Deviance: 12130 AIC: 24780
# this gives an AIC of 24,780
# (without gender, AIC of 24,800)
# checking childED*fpl
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=subset_child, na.action = na.omit)
ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl * childED + citizenship, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.7316807
## refEDpartial college and below
## 0.3181659
## ageyoung adult
## -0.1059382
## gendermale
## -0.1854323
## ethnicityNon-Hispanic Black
## 0.6476323
## ethnicityNon-Hispanic White
## -0.1826920
## ethnicityOther Hispanic
## 0.2657327
## ethnicityOther or Multi
## -0.2422974
## fplfamily income 2x poverty threshold
## -0.0765838
## fplfamily income 3x poverty threshold
## -0.0468372
## fplfamily income 4x poverty threshold
## 0.0735603
## fplfamily income 5x poverty threshold
## 0.0106034
## fplfamily income more than 5x poverty threshold
## -0.2248540
## childEDsecondary
## 0.5022878
## citizenshipnot U,S, citizen
## 0.1316535
## fplfamily income 2x poverty threshold:childEDsecondary
## 0.0243319
## fplfamily income 3x poverty threshold:childEDsecondary
## 0.1486179
## fplfamily income 4x poverty threshold:childEDsecondary
## -0.0001807
## fplfamily income 5x poverty threshold:childEDsecondary
## 0.2895914
## fplfamily income more than 5x poverty threshold:childEDsecondary
## 0.3903288
##
## Degrees of Freedom: 6618 Total (i.e. Null); 105 Residual
## (15730 observations deleted due to missingness)
## Null Deviance: 8327
## Residual Deviance: 7426 AIC: 24770
# this gives and AIC 24,770
# checking childED*ethnicity
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=subset_child, na.action = na.omit)
ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity *
## childED + fpl + citizenship, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.71648
## refEDpartial college and below
## 0.33079
## ageyoung adult
## -0.11898
## gendermale
## -0.18235
## ethnicityNon-Hispanic Black
## 0.67001
## ethnicityNon-Hispanic White
## -0.26213
## ethnicityOther Hispanic
## 0.28426
## ethnicityOther or Multi
## -0.25562
## childEDsecondary
## 0.48259
## fplfamily income 2x poverty threshold
## -0.06141
## fplfamily income 3x poverty threshold
## 0.01397
## fplfamily income 4x poverty threshold
## 0.07948
## fplfamily income 5x poverty threshold
## 0.11532
## fplfamily income more than 5x poverty threshold
## -0.07819
## citizenshipnot U,S, citizen
## 0.13473
## ethnicityNon-Hispanic Black:childEDsecondary
## -0.05033
## ethnicityNon-Hispanic White:childEDsecondary
## 0.24679
## ethnicityOther Hispanic:childEDsecondary
## -0.06259
## ethnicityOther or Multi:childEDsecondary
## 0.05825
##
## Degrees of Freedom: 6618 Total (i.e. Null); 106 Residual
## (15730 observations deleted due to missingness)
## Null Deviance: 8327
## Residual Deviance: 7430 AIC: 24770
# this gives and AIC 24,770
# checking childED*citizenship
model_child_int <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=subset_child, na.action = na.omit)
ols_child_int <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=nhc, na.action = na.omit))
ols_child_int
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship * childED, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.684059
## refEDpartial college and below
## 0.329206
## ageyoung adult
## -0.121009
## gendermale
## -0.181365
## ethnicityNon-Hispanic Black
## 0.645824
## ethnicityNon-Hispanic White
## -0.188952
## ethnicityOther Hispanic
## 0.261644
## ethnicityOther or Multi
## -0.242197
## fplfamily income 2x poverty threshold
## -0.064534
## fplfamily income 3x poverty threshold
## 0.002993
## fplfamily income 4x poverty threshold
## 0.075404
## fplfamily income 5x poverty threshold
## 0.107554
## fplfamily income more than 5x poverty threshold
## -0.082134
## citizenshipnot U,S, citizen
## 0.118607
## childEDsecondary
## 0.624637
## citizenshipnot U,S, citizen:childEDsecondary
## 0.013054
##
## Degrees of Freedom: 6618 Total (i.e. Null); 109 Residual
## (15730 observations deleted due to missingness)
## Null Deviance: 8327
## Residual Deviance: 7446 AIC: 24780
# this gives and AIC 24,780
model_whole <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
summ(model_whole)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.07 51.36 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.09 0.00
## ageyoung adult 0.44 0.05 8.15 0.00
## ethnicityNon-Hispanic 0.57 0.07 8.39 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.83 0.00
## fplfamily income 2x poverty 0.03 0.04 0.69 0.49
## threshold
## fplfamily income 3x poverty 0.06 0.06 1.09 0.28
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.82 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.85 0.07
## threshold
## fplfamily income more than 0.09 0.06 1.53 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.90 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(model_whole, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_whole, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.71 0.07 51.36 0.00
## refEDpartial college and 0.35 0.05 7.77 0.00
## below
## agemiddle-aged 0.36 0.03 10.78 0.00
## ageolder adult 0.35 0.06 6.09 0.00
## ageyoung adult 0.44 0.05 8.15 0.00
## ethnicityNon-Hispanic 0.57 0.07 8.39 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.45 0.00
## White
## ethnicityOther Hispanic 0.15 0.08 1.92 0.06
## ethnicityOther or Multi -0.57 0.08 -6.83 0.00
## fplfamily income 2x poverty 0.03 0.04 0.69 0.49
## threshold
## fplfamily income 3x poverty 0.06 0.06 1.09 0.28
## threshold
## fplfamily income 4x poverty 0.11 0.06 1.82 0.07
## threshold
## fplfamily income 5x poverty 0.12 0.06 1.85 0.07
## threshold
## fplfamily income more than 0.09 0.06 1.53 0.13
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.06 2.90 0.00
## citizen
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(model_whole, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.060
## Adj. R² = -0.080
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.712 3.569 3.855 51.356 0.000
## refEDpartial college and 0.355 0.264 0.445 7.765 0.000
## below
## agemiddle-aged 0.363 0.296 0.430 10.782 0.000
## ageolder adult 0.350 0.236 0.464 6.089 0.000
## ageyoung adult 0.443 0.335 0.551 8.155 0.000
## ethnicityNon-Hispanic 0.570 0.435 0.704 8.388 0.000
## Black
## ethnicityNon-Hispanic -0.287 -0.415 -0.159 -4.452 0.000
## White
## ethnicityOther Hispanic 0.146 -0.004 0.297 1.923 0.057
## ethnicityOther or Multi -0.566 -0.730 -0.402 -6.834 0.000
## fplfamily income 2x poverty 0.031 -0.057 0.118 0.693 0.490
## threshold
## fplfamily income 3x poverty 0.060 -0.049 0.170 1.094 0.277
## threshold
## fplfamily income 4x poverty 0.107 -0.009 0.224 1.822 0.071
## threshold
## fplfamily income 5x poverty 0.118 -0.008 0.244 1.854 0.066
## threshold
## fplfamily income more than 0.094 -0.028 0.216 1.533 0.128
## 5x poverty threshold
## citizenshipnot U,S, 0.168 0.053 0.283 2.899 0.005
## citizen
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.372
summ(model_whole, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.08
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.71 3.57 3.86 51.36
## refEDpartial college and 0.35 0.26 0.45 7.77
## below
## agemiddle-aged 0.36 0.30 0.43 10.78
## ageolder adult 0.35 0.24 0.46 6.09
## ageyoung adult 0.44 0.34 0.55 8.15
## ethnicityNon-Hispanic 0.57 0.44 0.70 8.39
## Black
## ethnicityNon-Hispanic -0.29 -0.42 -0.16 -4.45
## White
## ethnicityOther Hispanic 0.15 -0.00 0.30 1.92
## ethnicityOther or Multi -0.57 -0.73 -0.40 -6.83
## fplfamily income 2x poverty 0.03 -0.06 0.12 0.69
## threshold
## fplfamily income 3x poverty 0.06 -0.05 0.17 1.09
## threshold
## fplfamily income 4x poverty 0.11 -0.01 0.22 1.82
## threshold
## fplfamily income 5x poverty 0.12 -0.01 0.24 1.85
## threshold
## fplfamily income more than 0.09 -0.03 0.22 1.53
## 5x poverty threshold
## citizenshipnot U,S, 0.17 0.05 0.28 2.90
## citizen
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(model_whole)
plot_summs(model_whole, inner_ci_level = .9)
plot_summs(model_whole, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_whole, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_whole, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| (0.07) | |
| refEDpartial college and below | 0.35 *** |
| (0.05) | |
| agemiddle-aged | 0.36 *** |
| (0.03) | |
| ageolder adult | 0.35 *** |
| (0.06) | |
| ageyoung adult | 0.44 *** |
| (0.05) | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.29 *** |
| (0.06) | |
| ethnicityOther Hispanic | 0.15 |
| (0.08) | |
| ethnicityOther or Multi | -0.57 *** |
| (0.08) | |
| fplfamily income 2x poverty threshold | 0.03 |
| (0.04) | |
| fplfamily income 3x poverty threshold | 0.06 |
| (0.06) | |
| fplfamily income 4x poverty threshold | 0.11 |
| (0.06) | |
| fplfamily income 5x poverty threshold | 0.12 |
| (0.06) | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| (0.06) | |
| citizenshipnot U,S, citizen | 0.17 ** |
| (0.06) | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_whole, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.71 *** |
| [3.57, 3.86] | |
| refEDpartial college and below | 0.35 *** |
| [0.26, 0.45] | |
| agemiddle-aged | 0.36 *** |
| [0.30, 0.43] | |
| ageolder adult | 0.35 *** |
| [0.24, 0.46] | |
| ageyoung adult | 0.44 *** |
| [0.34, 0.55] | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| [0.44, 0.70] | |
| ethnicityNon-Hispanic White | -0.29 *** |
| [-0.42, -0.16] | |
| ethnicityOther Hispanic | 0.15 |
| [-0.00, 0.30] | |
| ethnicityOther or Multi | -0.57 *** |
| [-0.73, -0.40] | |
| fplfamily income 2x poverty threshold | 0.03 |
| [-0.06, 0.12] | |
| fplfamily income 3x poverty threshold | 0.06 |
| [-0.05, 0.17] | |
| fplfamily income 4x poverty threshold | 0.11 |
| [-0.01, 0.22] | |
| fplfamily income 5x poverty threshold | 0.12 |
| [-0.01, 0.24] | |
| fplfamily income more than 5x poverty threshold | 0.09 |
| [-0.03, 0.22] | |
| citizenshipnot U,S, citizen | 0.17 ** |
| [0.05, 0.28] | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# run each of the next three "model_whole" one at a time, to check for each interaction
## interaction: refED*age
# model_whole <- svyglm(log(monoEthyl)~refED*age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
## interaction: age*fpl
# model_whole <- svyglm(log(monoEthyl)~refED+gender+ethnicity+age*fpl+citizenship, design=nhc, na.action = na.omit)
## interaction: ethnicity*fpl
# model_whole <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*fpl+citizenship, design=nhc, na.action = na.omit)
## interaction: refED*fpl
model_whole <- svyglm(log(monoEthyl)~refED*fpl+gender+ethnicity+age+citizenship, design=nhc, na.action = na.omit)
summ(model_whole)
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.76 0.14 26.48 0.00
## refEDpartial college and 0.31 0.14 2.25 0.03
## below
## fplfamily income 2x poverty 0.05 0.14 0.38 0.71
## threshold
## fplfamily income 3x poverty 0.12 0.17 0.74 0.46
## threshold
## fplfamily income 4x poverty 0.23 0.15 1.58 0.12
## threshold
## fplfamily income 5x poverty -0.06 0.15 -0.41 0.68
## threshold
## fplfamily income more than 0.01 0.14 0.06 0.96
## 5x poverty threshold
## gendermale -0.01 0.03 -0.18 0.85
## ethnicityNon-Hispanic 0.57 0.07 8.32 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.54 0.00
## White
## ethnicityOther Hispanic 0.14 0.08 1.86 0.07
## ethnicityOther or Multi -0.58 0.08 -6.94 0.00
## agemiddle-aged 0.36 0.03 10.88 0.00
## ageolder adult 0.35 0.06 6.16 0.00
## ageyoung adult 0.44 0.05 8.06 0.00
## citizenshipnot U,S, 0.17 0.06 2.98 0.00
## citizen
## refEDpartial college and -0.03 0.15 -0.19 0.85
## below:fplfamily income 2x
## poverty threshold
## refEDpartial college and -0.09 0.18 -0.48 0.63
## below:fplfamily income 3x
## poverty threshold
## refEDpartial college and -0.19 0.16 -1.19 0.24
## below:fplfamily income 4x
## poverty threshold
## refEDpartial college and 0.27 0.17 1.58 0.12
## below:fplfamily income 5x
## poverty threshold
## refEDpartial college and 0.15 0.15 1.00 0.32
## below:fplfamily income more
## than 5x poverty threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(model_whole, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_whole, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.76 0.14 26.48 0.00
## refEDpartial college and 0.31 0.14 2.25 0.03
## below
## fplfamily income 2x poverty 0.05 0.14 0.38 0.71
## threshold
## fplfamily income 3x poverty 0.12 0.17 0.74 0.46
## threshold
## fplfamily income 4x poverty 0.23 0.15 1.58 0.12
## threshold
## fplfamily income 5x poverty -0.06 0.15 -0.41 0.68
## threshold
## fplfamily income more than 0.01 0.14 0.06 0.96
## 5x poverty threshold
## gendermale -0.01 0.03 -0.18 0.85
## ethnicityNon-Hispanic 0.57 0.07 8.32 0.00
## Black
## ethnicityNon-Hispanic -0.29 0.06 -4.54 0.00
## White
## ethnicityOther Hispanic 0.14 0.08 1.86 0.07
## ethnicityOther or Multi -0.58 0.08 -6.94 0.00
## agemiddle-aged 0.36 0.03 10.88 0.00
## ageolder adult 0.35 0.06 6.16 0.00
## ageyoung adult 0.44 0.05 8.06 0.00
## citizenshipnot U,S, 0.17 0.06 2.98 0.00
## citizen
## refEDpartial college and -0.03 0.15 -0.19 0.85
## below:fplfamily income 2x
## poverty threshold
## refEDpartial college and -0.09 0.18 -0.48 0.63
## below:fplfamily income 3x
## poverty threshold
## refEDpartial college and -0.19 0.16 -1.19 0.24
## below:fplfamily income 4x
## poverty threshold
## refEDpartial college and 0.27 0.17 1.58 0.12
## below:fplfamily income 5x
## poverty threshold
## refEDpartial college and 0.15 0.15 1.00 0.32
## below:fplfamily income more
## than 5x poverty threshold
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
summ(model_whole, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.062
## Adj. R² = -0.141
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.762 3.481 4.044 26.483 0.000
## refEDpartial college and 0.307 0.036 0.577 2.248 0.027
## below
## fplfamily income 2x poverty 0.053 -0.226 0.332 0.378 0.707
## threshold
## fplfamily income 3x poverty 0.124 -0.209 0.458 0.740 0.461
## threshold
## fplfamily income 4x poverty 0.234 -0.059 0.526 1.583 0.116
## threshold
## fplfamily income 5x poverty -0.060 -0.348 0.228 -0.411 0.682
## threshold
## fplfamily income more than 0.008 -0.274 0.290 0.056 0.955
## 5x poverty threshold
## gendermale -0.005 -0.064 0.053 -0.185 0.854
## ethnicityNon-Hispanic 0.566 0.431 0.701 8.323 0.000
## Black
## ethnicityNon-Hispanic -0.290 -0.417 -0.164 -4.544 0.000
## White
## ethnicityOther Hispanic 0.141 -0.009 0.292 1.860 0.066
## ethnicityOther or Multi -0.578 -0.743 -0.413 -6.938 0.000
## agemiddle-aged 0.363 0.297 0.429 10.877 0.000
## ageolder adult 0.352 0.239 0.466 6.155 0.000
## ageyoung adult 0.440 0.331 0.548 8.059 0.000
## citizenshipnot U,S, 0.172 0.058 0.286 2.981 0.004
## citizen
## refEDpartial college and -0.028 -0.321 0.265 -0.190 0.850
## below:fplfamily income 2x
## poverty threshold
## refEDpartial college and -0.085 -0.438 0.267 -0.479 0.633
## below:fplfamily income 3x
## poverty threshold
## refEDpartial college and -0.190 -0.507 0.128 -1.186 0.238
## below:fplfamily income 4x
## poverty threshold
## refEDpartial college and 0.269 -0.069 0.607 1.579 0.117
## below:fplfamily income 5x
## poverty threshold
## refEDpartial college and 0.151 -0.147 0.449 1.004 0.318
## below:fplfamily income more
## than 5x poverty threshold
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.368
summ(model_whole, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 19218
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.14
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.76 3.48 4.04 26.48
## refEDpartial college and 0.31 0.04 0.58 2.25
## below
## fplfamily income 2x poverty 0.05 -0.23 0.33 0.38
## threshold
## fplfamily income 3x poverty 0.12 -0.21 0.46 0.74
## threshold
## fplfamily income 4x poverty 0.23 -0.06 0.53 1.58
## threshold
## fplfamily income 5x poverty -0.06 -0.35 0.23 -0.41
## threshold
## fplfamily income more than 0.01 -0.27 0.29 0.06
## 5x poverty threshold
## gendermale -0.01 -0.06 0.05 -0.18
## ethnicityNon-Hispanic 0.57 0.43 0.70 8.32
## Black
## ethnicityNon-Hispanic -0.29 -0.42 -0.16 -4.54
## White
## ethnicityOther Hispanic 0.14 -0.01 0.29 1.86
## ethnicityOther or Multi -0.58 -0.74 -0.41 -6.94
## agemiddle-aged 0.36 0.30 0.43 10.88
## ageolder adult 0.35 0.24 0.47 6.16
## ageyoung adult 0.44 0.33 0.55 8.06
## citizenshipnot U,S, 0.17 0.06 0.29 2.98
## citizen
## refEDpartial college and -0.03 -0.32 0.27 -0.19
## below:fplfamily income 2x
## poverty threshold
## refEDpartial college and -0.09 -0.44 0.27 -0.48
## below:fplfamily income 3x
## poverty threshold
## refEDpartial college and -0.19 -0.51 0.13 -1.19
## below:fplfamily income 4x
## poverty threshold
## refEDpartial college and 0.27 -0.07 0.61 1.58
## below:fplfamily income 5x
## poverty threshold
## refEDpartial college and 0.15 -0.15 0.45 1.00
## below:fplfamily income more
## than 5x poverty threshold
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.37
# THE GRAPH
plot_summs(model_whole)
plot_summs(model_whole, inner_ci_level = .9)
plot_summs(model_whole, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_whole, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_whole, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.76 *** |
| (0.14) | |
| refEDpartial college and below | 0.31 * |
| (0.14) | |
| fplfamily income 2x poverty threshold | 0.05 |
| (0.14) | |
| fplfamily income 3x poverty threshold | 0.12 |
| (0.17) | |
| fplfamily income 4x poverty threshold | 0.23 |
| (0.15) | |
| fplfamily income 5x poverty threshold | -0.06 |
| (0.15) | |
| fplfamily income more than 5x poverty threshold | 0.01 |
| (0.14) | |
| gendermale | -0.01 |
| (0.03) | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.29 *** |
| (0.06) | |
| ethnicityOther Hispanic | 0.14 |
| (0.08) | |
| ethnicityOther or Multi | -0.58 *** |
| (0.08) | |
| agemiddle-aged | 0.36 *** |
| (0.03) | |
| ageolder adult | 0.35 *** |
| (0.06) | |
| ageyoung adult | 0.44 *** |
| (0.05) | |
| citizenshipnot U,S, citizen | 0.17 ** |
| (0.06) | |
| refEDpartial college and below:fplfamily income 2x poverty threshold | -0.03 |
| (0.15) | |
| refEDpartial college and below:fplfamily income 3x poverty threshold | -0.09 |
| (0.18) | |
| refEDpartial college and below:fplfamily income 4x poverty threshold | -0.19 |
| (0.16) | |
| refEDpartial college and below:fplfamily income 5x poverty threshold | 0.27 |
| (0.17) | |
| refEDpartial college and below:fplfamily income more than 5x poverty threshold | 0.15 |
| (0.15) | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_whole, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.76 *** |
| [3.48, 4.04] | |
| refEDpartial college and below | 0.31 * |
| [0.04, 0.58] | |
| fplfamily income 2x poverty threshold | 0.05 |
| [-0.23, 0.33] | |
| fplfamily income 3x poverty threshold | 0.12 |
| [-0.21, 0.46] | |
| fplfamily income 4x poverty threshold | 0.23 |
| [-0.06, 0.53] | |
| fplfamily income 5x poverty threshold | -0.06 |
| [-0.35, 0.23] | |
| fplfamily income more than 5x poverty threshold | 0.01 |
| [-0.27, 0.29] | |
| gendermale | -0.01 |
| [-0.06, 0.05] | |
| ethnicityNon-Hispanic Black | 0.57 *** |
| [0.43, 0.70] | |
| ethnicityNon-Hispanic White | -0.29 *** |
| [-0.42, -0.16] | |
| ethnicityOther Hispanic | 0.14 |
| [-0.01, 0.29] | |
| ethnicityOther or Multi | -0.58 *** |
| [-0.74, -0.41] | |
| agemiddle-aged | 0.36 *** |
| [0.30, 0.43] | |
| ageolder adult | 0.35 *** |
| [0.24, 0.47] | |
| ageyoung adult | 0.44 *** |
| [0.33, 0.55] | |
| citizenshipnot U,S, citizen | 0.17 ** |
| [0.06, 0.29] | |
| refEDpartial college and below:fplfamily income 2x poverty threshold | -0.03 |
| [-0.32, 0.27] | |
| refEDpartial college and below:fplfamily income 3x poverty threshold | -0.09 |
| [-0.44, 0.27] | |
| refEDpartial college and below:fplfamily income 4x poverty threshold | -0.19 |
| [-0.51, 0.13] | |
| refEDpartial college and below:fplfamily income 5x poverty threshold | 0.27 |
| [-0.07, 0.61] | |
| refEDpartial college and below:fplfamily income more than 5x poverty threshold | 0.15 |
| [-0.15, 0.45] | |
| N | 19218 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# run each of the next three "model_whole" one at a time, to check for each interaction
subset_child <- subset(nhc, RIDAGEYR <= 19)
## interaction: childED*fpl
# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*childED+citizenship, design=subset_child, na.action = na.omit)
## interaction: childED*ethnicity
# model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*childED+fpl+citizenship, design=subset_child, na.action = na.omit)
## interaction: childED*citizenship
model_child <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*childED, design=subset_child, na.action = na.omit)
summ(model_child)
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.68 0.10 37.95 0.00
## refEDpartial college and 0.33 0.07 4.89 0.00
## below
## ageyoung adult -0.12 0.10 -1.20 0.23
## gendermale -0.18 0.04 -4.12 0.00
## ethnicityNon-Hispanic 0.65 0.07 9.01 0.00
## Black
## ethnicityNon-Hispanic -0.19 0.08 -2.45 0.02
## White
## ethnicityOther Hispanic 0.26 0.10 2.58 0.01
## ethnicityOther or Multi -0.24 0.10 -2.35 0.02
## fplfamily income 2x poverty -0.06 0.05 -1.22 0.22
## threshold
## fplfamily income 3x poverty 0.00 0.07 0.04 0.97
## threshold
## fplfamily income 4x poverty 0.08 0.08 0.93 0.35
## threshold
## fplfamily income 5x poverty 0.11 0.10 1.10 0.27
## threshold
## fplfamily income more than -0.08 0.09 -0.91 0.36
## 5x poverty threshold
## citizenshipnot U,S, 0.12 0.11 1.06 0.29
## citizen
## childEDsecondary 0.62 0.05 12.49 0.00
## citizenshipnot U,S, 0.01 0.19 0.07 0.94
## citizen:childEDsecondary
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
summ(model_child, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_child, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03
##
## Standard errors: Robust
## --------------------------------------------------------------
## Est. S.E. t val. p
## ------------------------------- ------- ------ -------- ------
## (Intercept) 3.68 0.10 37.95 0.00
## refEDpartial college and 0.33 0.07 4.89 0.00
## below
## ageyoung adult -0.12 0.10 -1.20 0.23
## gendermale -0.18 0.04 -4.12 0.00
## ethnicityNon-Hispanic 0.65 0.07 9.01 0.00
## Black
## ethnicityNon-Hispanic -0.19 0.08 -2.45 0.02
## White
## ethnicityOther Hispanic 0.26 0.10 2.58 0.01
## ethnicityOther or Multi -0.24 0.10 -2.35 0.02
## fplfamily income 2x poverty -0.06 0.05 -1.22 0.22
## threshold
## fplfamily income 3x poverty 0.00 0.07 0.04 0.97
## threshold
## fplfamily income 4x poverty 0.08 0.08 0.93 0.35
## threshold
## fplfamily income 5x poverty 0.11 0.10 1.10 0.27
## threshold
## fplfamily income more than -0.08 0.09 -0.91 0.36
## 5x poverty threshold
## citizenshipnot U,S, 0.12 0.11 1.06 0.29
## citizen
## childEDsecondary 0.62 0.05 12.49 0.00
## citizenshipnot U,S, 0.01 0.19 0.07 0.94
## citizen:childEDsecondary
## --------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
summ(model_child, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.106
## Adj. R² = -0.033
##
## Standard errors: Robust
## ---------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## ------------------------------- -------- -------- -------- -------- -------
## (Intercept) 3.684 3.492 3.876 37.953 0.000
## refEDpartial college and 0.329 0.196 0.463 4.888 0.000
## below
## ageyoung adult -0.121 -0.321 0.079 -1.198 0.234
## gendermale -0.181 -0.269 -0.094 -4.124 0.000
## ethnicityNon-Hispanic 0.646 0.504 0.788 9.012 0.000
## Black
## ethnicityNon-Hispanic -0.189 -0.342 -0.036 -2.452 0.016
## White
## ethnicityOther Hispanic 0.262 0.060 0.463 2.577 0.011
## ethnicityOther or Multi -0.242 -0.446 -0.038 -2.351 0.020
## fplfamily income 2x poverty -0.065 -0.169 0.040 -1.222 0.224
## threshold
## fplfamily income 3x poverty 0.003 -0.144 0.150 0.040 0.968
## threshold
## fplfamily income 4x poverty 0.075 -0.085 0.236 0.932 0.353
## threshold
## fplfamily income 5x poverty 0.108 -0.087 0.302 1.098 0.275
## threshold
## fplfamily income more than -0.082 -0.261 0.097 -0.910 0.365
## 5x poverty threshold
## citizenshipnot U,S, 0.119 -0.103 0.340 1.063 0.290
## citizen
## childEDsecondary 0.625 0.526 0.724 12.489 0.000
## citizenshipnot U,S, 0.013 -0.354 0.380 0.070 0.944
## citizen:childEDsecondary
## ---------------------------------------------------------------------------
##
## Estimated dispersion parameter = 1.833
summ(model_child, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 6619
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.11
## Adj. R² = -0.03
##
## Standard errors: Robust
## ----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## ------------------------------- ------- ------- ------- --------
## (Intercept) 3.68 3.49 3.88 37.95
## refEDpartial college and 0.33 0.20 0.46 4.89
## below
## ageyoung adult -0.12 -0.32 0.08 -1.20
## gendermale -0.18 -0.27 -0.09 -4.12
## ethnicityNon-Hispanic 0.65 0.50 0.79 9.01
## Black
## ethnicityNon-Hispanic -0.19 -0.34 -0.04 -2.45
## White
## ethnicityOther Hispanic 0.26 0.06 0.46 2.58
## ethnicityOther or Multi -0.24 -0.45 -0.04 -2.35
## fplfamily income 2x poverty -0.06 -0.17 0.04 -1.22
## threshold
## fplfamily income 3x poverty 0.00 -0.14 0.15 0.04
## threshold
## fplfamily income 4x poverty 0.08 -0.08 0.24 0.93
## threshold
## fplfamily income 5x poverty 0.11 -0.09 0.30 1.10
## threshold
## fplfamily income more than -0.08 -0.26 0.10 -0.91
## 5x poverty threshold
## citizenshipnot U,S, 0.12 -0.10 0.34 1.06
## citizen
## childEDsecondary 0.62 0.53 0.72 12.49
## citizenshipnot U,S, 0.01 -0.35 0.38 0.07
## citizen:childEDsecondary
## ----------------------------------------------------------------
##
## Estimated dispersion parameter = 1.83
# THE GRAPH
plot_summs(model_child)
plot_summs(model_child, inner_ci_level = .9)
plot_summs(model_child, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_child, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_child, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 3.68 *** |
| (0.10) | |
| refEDpartial college and below | 0.33 *** |
| (0.07) | |
| ageyoung adult | -0.12 |
| (0.10) | |
| gendermale | -0.18 *** |
| (0.04) | |
| ethnicityNon-Hispanic Black | 0.65 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.19 * |
| (0.08) | |
| ethnicityOther Hispanic | 0.26 * |
| (0.10) | |
| ethnicityOther or Multi | -0.24 * |
| (0.10) | |
| fplfamily income 2x poverty threshold | -0.06 |
| (0.05) | |
| fplfamily income 3x poverty threshold | 0.00 |
| (0.07) | |
| fplfamily income 4x poverty threshold | 0.08 |
| (0.08) | |
| fplfamily income 5x poverty threshold | 0.11 |
| (0.10) | |
| fplfamily income more than 5x poverty threshold | -0.08 |
| (0.09) | |
| citizenshipnot U,S, citizen | 0.12 |
| (0.11) | |
| childEDsecondary | 0.62 *** |
| (0.05) | |
| citizenshipnot U,S, citizen:childEDsecondary | 0.01 |
| (0.19) | |
| N | 6619 |
| R2 | 0.11 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_child, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 3.68 *** |
| [3.49, 3.88] | |
| refEDpartial college and below | 0.33 *** |
| [0.20, 0.46] | |
| ageyoung adult | -0.12 |
| [-0.32, 0.08] | |
| gendermale | -0.18 *** |
| [-0.27, -0.09] | |
| ethnicityNon-Hispanic Black | 0.65 *** |
| [0.50, 0.79] | |
| ethnicityNon-Hispanic White | -0.19 * |
| [-0.34, -0.04] | |
| ethnicityOther Hispanic | 0.26 * |
| [0.06, 0.46] | |
| ethnicityOther or Multi | -0.24 * |
| [-0.45, -0.04] | |
| fplfamily income 2x poverty threshold | -0.06 |
| [-0.17, 0.04] | |
| fplfamily income 3x poverty threshold | 0.00 |
| [-0.14, 0.15] | |
| fplfamily income 4x poverty threshold | 0.08 |
| [-0.08, 0.24] | |
| fplfamily income 5x poverty threshold | 0.11 |
| [-0.09, 0.30] | |
| fplfamily income more than 5x poverty threshold | -0.08 |
| [-0.26, 0.10] | |
| citizenshipnot U,S, citizen | 0.12 |
| [-0.10, 0.34] | |
| childEDsecondary | 0.62 *** |
| [0.53, 0.72] | |
| citizenshipnot U,S, citizen:childEDsecondary | 0.01 |
| [-0.35, 0.38] | |
| N | 6619 |
| R2 | 0.11 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# run each of the next three "model_whole" one at a time, to check for each interaction
subset_adult <- subset(nhc, RIDAGEYR > 19)
## interaction: refED*adultED
# model_adult <- svyglm(log(monoEthyl)~refED*adultED+age+gender+ethnicity+fpl+citizenship, design=subset_adult, na.action = na.omit)
## interaction: adultED*fpl
# model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl*adultED+citizenship, design=subset_adult, na.action = na.omit)
## interaction: adultED*ethnicity
# model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity*adultED+fpl+citizenship, design=subset_adult, na.action = na.omit)
# interaction: adultED*citizenship
model_adult <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship*adultED, design=subset_adult, na.action = na.omit)
summ(model_adult)
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17
##
## Standard errors: Robust
## ---------------------------------------------------------------
## Est. S.E. t val. p
## -------------------------------- ------- ------ -------- ------
## (Intercept) 4.47 0.11 39.64 0.00
## refEDpartial college and 0.14 0.07 2.07 0.04
## below
## ageolder adult -0.02 0.05 -0.38 0.71
## ageyoung adult 0.08 0.06 1.32 0.19
## gendermale 0.03 0.04 0.95 0.34
## ethnicityNon-Hispanic 0.47 0.07 6.61 0.00
## Black
## ethnicityNon-Hispanic -0.39 0.07 -5.74 0.00
## White
## ethnicityOther Hispanic 0.05 0.08 0.62 0.54
## ethnicityOther or Multi -0.71 0.09 -7.58 0.00
## fplfamily income 2x poverty 0.07 0.05 1.34 0.18
## threshold
## fplfamily income 3x poverty 0.09 0.07 1.31 0.19
## threshold
## fplfamily income 4x poverty 0.15 0.07 2.12 0.04
## threshold
## fplfamily income 5x poverty 0.16 0.08 2.09 0.04
## threshold
## fplfamily income more than 0.18 0.07 2.55 0.01
## 5x poverty threshold
## citizenshipnot U,S, 0.02 0.11 0.15 0.88
## citizen
## adultEDcollege grad or -0.41 0.08 -5.17 0.00
## above
## adultEDhigh school -0.08 0.06 -1.32 0.19
## grad/GED
## adultEDless than 9th grade -0.19 0.09 -2.00 0.05
## adultEDsome college or AA -0.18 0.07 -2.70 0.01
## citizenshipnot U,S, 0.04 0.16 0.27 0.78
## citizen:adultEDcollege grad or
## above
## citizenshipnot U,S, 0.15 0.15 0.98 0.33
## citizen:adultEDhigh school
## grad/GED
## citizenshipnot U,S, 0.10 0.15 0.65 0.52
## citizen:adultEDless than 9th
## grade
## citizenshipnot U,S, 0.35 0.18 1.92 0.06
## citizen:adultEDsome college or
## AA
## ---------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
summ(model_adult, robust = "HC1") #robust standard errors
## Warning in summ.svyglm(model_adult, robust = "HC1"): Robust standard errors are reported by default
## in the survey package.
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17
##
## Standard errors: Robust
## ---------------------------------------------------------------
## Est. S.E. t val. p
## -------------------------------- ------- ------ -------- ------
## (Intercept) 4.47 0.11 39.64 0.00
## refEDpartial college and 0.14 0.07 2.07 0.04
## below
## ageolder adult -0.02 0.05 -0.38 0.71
## ageyoung adult 0.08 0.06 1.32 0.19
## gendermale 0.03 0.04 0.95 0.34
## ethnicityNon-Hispanic 0.47 0.07 6.61 0.00
## Black
## ethnicityNon-Hispanic -0.39 0.07 -5.74 0.00
## White
## ethnicityOther Hispanic 0.05 0.08 0.62 0.54
## ethnicityOther or Multi -0.71 0.09 -7.58 0.00
## fplfamily income 2x poverty 0.07 0.05 1.34 0.18
## threshold
## fplfamily income 3x poverty 0.09 0.07 1.31 0.19
## threshold
## fplfamily income 4x poverty 0.15 0.07 2.12 0.04
## threshold
## fplfamily income 5x poverty 0.16 0.08 2.09 0.04
## threshold
## fplfamily income more than 0.18 0.07 2.55 0.01
## 5x poverty threshold
## citizenshipnot U,S, 0.02 0.11 0.15 0.88
## citizen
## adultEDcollege grad or -0.41 0.08 -5.17 0.00
## above
## adultEDhigh school -0.08 0.06 -1.32 0.19
## grad/GED
## adultEDless than 9th grade -0.19 0.09 -2.00 0.05
## adultEDsome college or AA -0.18 0.07 -2.70 0.01
## citizenshipnot U,S, 0.04 0.16 0.27 0.78
## citizen:adultEDcollege grad or
## above
## citizenshipnot U,S, 0.15 0.15 0.98 0.33
## citizen:adultEDhigh school
## grad/GED
## citizenshipnot U,S, 0.10 0.15 0.65 0.52
## citizen:adultEDless than 9th
## grade
## citizenshipnot U,S, 0.35 0.18 1.92 0.06
## citizen:adultEDsome college or
## AA
## ---------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
summ(model_adult, confint = TRUE, digits = 3) #In many cases, you’ll learn more by looking at confidence intervals than p-values. You can request them from summ. default is 95% CIs
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.057
## Adj. R² = -0.172
##
## Standard errors: Robust
## ----------------------------------------------------------------------------
## Est. 2.5% 97.5% t val. p
## -------------------------------- -------- -------- -------- -------- -------
## (Intercept) 4.469 4.245 4.692 39.640 0.000
## refEDpartial college and 0.140 0.006 0.274 2.075 0.041
## below
## ageolder adult -0.020 -0.129 0.088 -0.375 0.708
## ageyoung adult 0.077 -0.039 0.193 1.318 0.190
## gendermale 0.034 -0.037 0.106 0.949 0.345
## ethnicityNon-Hispanic 0.471 0.330 0.613 6.613 0.000
## Black
## ethnicityNon-Hispanic -0.388 -0.522 -0.254 -5.738 0.000
## White
## ethnicityOther Hispanic 0.052 -0.115 0.219 0.617 0.538
## ethnicityOther or Multi -0.707 -0.893 -0.522 -7.576 0.000
## fplfamily income 2x poverty 0.073 -0.035 0.180 1.345 0.182
## threshold
## fplfamily income 3x poverty 0.088 -0.045 0.221 1.309 0.193
## threshold
## fplfamily income 4x poverty 0.148 0.009 0.286 2.118 0.037
## threshold
## fplfamily income 5x poverty 0.159 0.008 0.309 2.086 0.040
## threshold
## fplfamily income more than 0.180 0.040 0.319 2.546 0.012
## 5x poverty threshold
## citizenshipnot U,S, 0.016 -0.204 0.237 0.149 0.882
## citizen
## adultEDcollege grad or -0.406 -0.562 -0.250 -5.170 0.000
## above
## adultEDhigh school -0.081 -0.203 0.040 -1.323 0.189
## grad/GED
## adultEDless than 9th grade -0.190 -0.378 -0.002 -2.002 0.048
## adultEDsome college or AA -0.184 -0.319 -0.049 -2.697 0.008
## citizenshipnot U,S, 0.044 -0.274 0.362 0.274 0.785
## citizen:adultEDcollege grad or
## above
## citizenshipnot U,S, 0.149 -0.153 0.451 0.980 0.329
## citizen:adultEDhigh school
## grad/GED
## citizenshipnot U,S, 0.098 -0.203 0.399 0.646 0.520
## citizen:adultEDless than 9th
## grade
## citizenshipnot U,S, 0.353 -0.011 0.716 1.925 0.057
## citizen:adultEDsome college or
## AA
## ----------------------------------------------------------------------------
##
## Estimated dispersion parameter = 2.487
summ(model_adult, confint = TRUE, pvals = FALSE) #DROP the p values all together
## MODEL INFO:
## Observations: 12132
## Dependent Variable: log(monoEthyl)
## Type: Survey-weighted linear regression
##
## MODEL FIT:
## R² = 0.06
## Adj. R² = -0.17
##
## Standard errors: Robust
## -----------------------------------------------------------------
## Est. 2.5% 97.5% t val.
## -------------------------------- ------- ------- ------- --------
## (Intercept) 4.47 4.25 4.69 39.64
## refEDpartial college and 0.14 0.01 0.27 2.07
## below
## ageolder adult -0.02 -0.13 0.09 -0.38
## ageyoung adult 0.08 -0.04 0.19 1.32
## gendermale 0.03 -0.04 0.11 0.95
## ethnicityNon-Hispanic 0.47 0.33 0.61 6.61
## Black
## ethnicityNon-Hispanic -0.39 -0.52 -0.25 -5.74
## White
## ethnicityOther Hispanic 0.05 -0.12 0.22 0.62
## ethnicityOther or Multi -0.71 -0.89 -0.52 -7.58
## fplfamily income 2x poverty 0.07 -0.03 0.18 1.34
## threshold
## fplfamily income 3x poverty 0.09 -0.05 0.22 1.31
## threshold
## fplfamily income 4x poverty 0.15 0.01 0.29 2.12
## threshold
## fplfamily income 5x poverty 0.16 0.01 0.31 2.09
## threshold
## fplfamily income more than 0.18 0.04 0.32 2.55
## 5x poverty threshold
## citizenshipnot U,S, 0.02 -0.20 0.24 0.15
## citizen
## adultEDcollege grad or -0.41 -0.56 -0.25 -5.17
## above
## adultEDhigh school -0.08 -0.20 0.04 -1.32
## grad/GED
## adultEDless than 9th grade -0.19 -0.38 -0.00 -2.00
## adultEDsome college or AA -0.18 -0.32 -0.05 -2.70
## citizenshipnot U,S, 0.04 -0.27 0.36 0.27
## citizen:adultEDcollege grad or
## above
## citizenshipnot U,S, 0.15 -0.15 0.45 0.98
## citizen:adultEDhigh school
## grad/GED
## citizenshipnot U,S, 0.10 -0.20 0.40 0.65
## citizen:adultEDless than 9th
## grade
## citizenshipnot U,S, 0.35 -0.01 0.72 1.92
## citizen:adultEDsome college or
## AA
## -----------------------------------------------------------------
##
## Estimated dispersion parameter = 2.49
# THE GRAPH
plot_summs(model_adult)
plot_summs(model_adult, inner_ci_level = .9)
plot_summs(model_adult, robust = TRUE)
# plot coefficient uncertainty as normal distributions
plot_summs(model_adult, plot.distributions = TRUE, inner_ci_level = .9)
# table output for Word and RMarkdown documents
## error is in the parenthesis
export_summs(model_adult, scale = TRUE)
| Model 1 | |
|---|---|
| (Intercept) | 4.47 *** |
| (0.11) | |
| refEDpartial college and below | 0.14 * |
| (0.07) | |
| ageolder adult | -0.02 |
| (0.05) | |
| ageyoung adult | 0.08 |
| (0.06) | |
| gendermale | 0.03 |
| (0.04) | |
| ethnicityNon-Hispanic Black | 0.47 *** |
| (0.07) | |
| ethnicityNon-Hispanic White | -0.39 *** |
| (0.07) | |
| ethnicityOther Hispanic | 0.05 |
| (0.08) | |
| ethnicityOther or Multi | -0.71 *** |
| (0.09) | |
| fplfamily income 2x poverty threshold | 0.07 |
| (0.05) | |
| fplfamily income 3x poverty threshold | 0.09 |
| (0.07) | |
| fplfamily income 4x poverty threshold | 0.15 * |
| (0.07) | |
| fplfamily income 5x poverty threshold | 0.16 * |
| (0.08) | |
| fplfamily income more than 5x poverty threshold | 0.18 * |
| (0.07) | |
| citizenshipnot U,S, citizen | 0.02 |
| (0.11) | |
| adultEDcollege grad or above | -0.41 *** |
| (0.08) | |
| adultEDhigh school grad/GED | -0.08 |
| (0.06) | |
| adultEDless than 9th grade | -0.19 * |
| (0.09) | |
| adultEDsome college or AA | -0.18 ** |
| (0.07) | |
| citizenshipnot U,S, citizen:adultEDcollege grad or above | 0.04 |
| (0.16) | |
| citizenshipnot U,S, citizen:adultEDhigh school grad/GED | 0.15 |
| (0.15) | |
| citizenshipnot U,S, citizen:adultEDless than 9th grade | 0.10 |
| (0.15) | |
| citizenshipnot U,S, citizen:adultEDsome college or AA | 0.35 |
| (0.18) | |
| N | 12132 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
# confidence intervals instead of standard errors
export_summs(model_adult, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
| Model 1 | |
|---|---|
| (Intercept) | 4.47 *** |
| [4.25, 4.69] | |
| refEDpartial college and below | 0.14 * |
| [0.01, 0.27] | |
| ageolder adult | -0.02 |
| [-0.13, 0.09] | |
| ageyoung adult | 0.08 |
| [-0.04, 0.19] | |
| gendermale | 0.03 |
| [-0.04, 0.11] | |
| ethnicityNon-Hispanic Black | 0.47 *** |
| [0.33, 0.61] | |
| ethnicityNon-Hispanic White | -0.39 *** |
| [-0.52, -0.25] | |
| ethnicityOther Hispanic | 0.05 |
| [-0.12, 0.22] | |
| ethnicityOther or Multi | -0.71 *** |
| [-0.89, -0.52] | |
| fplfamily income 2x poverty threshold | 0.07 |
| [-0.03, 0.18] | |
| fplfamily income 3x poverty threshold | 0.09 |
| [-0.05, 0.22] | |
| fplfamily income 4x poverty threshold | 0.15 * |
| [0.01, 0.29] | |
| fplfamily income 5x poverty threshold | 0.16 * |
| [0.01, 0.31] | |
| fplfamily income more than 5x poverty threshold | 0.18 * |
| [0.04, 0.32] | |
| citizenshipnot U,S, citizen | 0.02 |
| [-0.20, 0.24] | |
| adultEDcollege grad or above | -0.41 *** |
| [-0.56, -0.25] | |
| adultEDhigh school grad/GED | -0.08 |
| [-0.20, 0.04] | |
| adultEDless than 9th grade | -0.19 * |
| [-0.38, -0.00] | |
| adultEDsome college or AA | -0.18 ** |
| [-0.32, -0.05] | |
| citizenshipnot U,S, citizen:adultEDcollege grad or above | 0.04 |
| [-0.27, 0.36] | |
| citizenshipnot U,S, citizen:adultEDhigh school grad/GED | 0.15 |
| [-0.15, 0.45] | |
| citizenshipnot U,S, citizen:adultEDless than 9th grade | 0.10 |
| [-0.20, 0.40] | |
| citizenshipnot U,S, citizen:adultEDsome college or AA | 0.35 |
| [-0.01, 0.72] | |
| N | 12132 |
| R2 | 0.06 |
| All continuous predictors are mean-centered and scaled by 1 standard deviation. *** p < 0.001; ** p < 0.01; * p < 0.05. | |
The model that performs best is Model B, a simple linear regression showing the logged mono-ethyl phthalate as a function of the reference person’s education level, the participant’s age, the participant’s ethnicity, the participant’s family income to poverty ratio, and the participant’s citizenship status.
XXX The delta BIC for Model C is 72.136, which is far beyond the BIC <7 threshold, so it is a very unlikely model, and therefore, we will dismiss Model C. This reinforces the idea that citizenship status matters substantially to the question of determining the logged phthalate level in individuals.
XXX Model A and Model B are quite similar - the only difference is that Model B does not include gender. Model A, which is the model performing second best, has a delta BIC = 6.590 (and still under the BIC < 7 threshold).
An ANOVA test is a type of statistical test used to determine if there is a statistically significant difference between two or more categorical groups by testing for differences of means using variance.
A parametric statistical measure to confirm whether a set of independent variables are collectively ‘significant’ for a model or not
modela<-svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
modelb<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
modelc<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)
modeld<-svyglm(log(monoEthyl)~age+ethnicity+fpl, design=nhc, na.action = na.omit)
modele<-svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship+childED, design=nhc, na.action = na.omit)
########## BIC
BIC(modela, modelb, maximal=modela)
## p BIC neff
## [1,] 16 45748.15 NaN
## [2,] 15 45739.74 4604.58
BIC(modelb, modelc, maximal=modelb)
## p BIC neff
## [1,] 15 45738.41 NaN
## [2,] 14 45738.26 5288.861
BIC(modela, modelc, maximal=modela)
## p BIC neff
## [1,] 16 45748.15 NaN
## [2,] 14 45739.74 4957.434
BIC(modela, modelb, modelc, maximal=modela)
## p BIC neff
## [1,] 16 45748.15 NaN
## [2,] 15 45739.74 4604.580
## [3,] 14 45739.74 4957.434
### ???? see questions in "meetings with susie" google doc
BIC_list <- c(BIC(modela, maximal=modela), BIC(modelb, maximal=modela), BIC(modelc, maximal=modela))
model_output <- rbind(data.frame(glance(modela)), data.frame(glance(modelb)), data.frame(glance(modelc))) %>% select(BIC)
model_output <- mutate(model_output, delta.BIC = BIC-min(BIC_list))
model_output$model <- c("Model A", "Model B", "Model C")
model_output <- model_output[,c("model", "BIC", "delta.BIC")]
kable(model_output, format = "markdown", digits = 3, caption = "BIC, and Delta.BIC for the models. Delta BIC > 7 indicates models that should be dismissed from further consideration.")
| model | BIC | delta.BIC |
|---|---|---|
| Model A | 45748.15 | NaN |
| Model B | 45738.41 | NaN |
| Model C | 45803.82 | NaN |
########## ANOVA
########## Wald test
anova(modela)
## Anova table: (Rao-Scott LRT)
## svyglm(formula = log(monoEthyl) ~ refED, design = nhc, na.action = na.omit)
## stats DEff df ddf p
## refED 2668.6416 8.56660 1.00000 123 < 2.2e-16 ***
## age 480.9227 5.09450 3.00000 120 4.043e-13 ***
## gender 0.5746 4.22020 1.00000 119 0.7067
## ethnicity 1796.7690 5.18580 4.00000 115 < 2.2e-16 ***
## fpl 3102.9314 4.66830 5.00000 110 < 2.2e-16 ***
## citizenship 75.3416 3.61700 1.00000 109 1.440e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modelc, modela)
## Working (Rao-Scott+F) LRT for gender citizenship
## in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship, design = nhc, na.action = na.omit)
## Working 2logLR = 19.30158 p= 0.00016308
## (scale factors: 1.1 0.89 ); denominator df= 109
anova(modela, modelb, method = "Wald")
## Wald test for gender
## in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship, design = nhc, na.action = na.omit)
## F = 0.03018945 on 1 and 109 df: p= 0.86238
# wald test for gender, p =-.86238
anova(modelb, modelc, method = "Wald")
## Wald test for citizenship
## in svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl +
## citizenship, design = nhc, na.action = na.omit)
## F = 8.405465 on 1 and 110 df: p= 0.0045171
# wald test for citizenship, p =0.0045171
anova(modela, modelc, method = "Wald")
## Wald test for gender citizenship
## in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship, design = nhc, na.action = na.omit)
## F = 4.298927 on 2 and 109 df: p= 0.015958
# wald test for gender citizenship, p = 0.015958
anova(modela, modeld, method = "Wald")
## Wald test for refED gender citizenship
## in svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship, design = nhc, na.action = na.omit)
## F = 22.43636 on 3 and 109 df: p= 2.1802e-11
# wald test for refeD gender citizenship, p = 2.1802e-11
anova(modele)
## Anova table: (Rao-Scott LRT)
## svyglm(formula = log(monoEthyl) ~ refED, design = nhc, na.action = na.omit)
## stats DEff df ddf p
## refED 2668.642 8.5666 1.0000 123 < 2.2e-16 ***
## age 480.923 5.0945 3.0000 120 4.043e-13 ***
## ethnicity 1797.344 5.1837 4.0000 116 < 2.2e-16 ***
## fpl 3102.892 4.6784 5.0000 111 < 2.2e-16 ***
## citizenship 75.255 3.6369 1.0000 110 1.517e-05 ***
## childED 38111.077 1.8139 1.0000 111 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# anova(modela, modele, method = "Wald")
## cannot do a wald test due to models having different number of observations... does this mean that childED is only using 34% of the data, because 66% of it is missing
vis_miss(fullNHANES_recat, sort_miss = TRUE)
# change the noNAs dataset with each boxplot I create:
## one for: refED age gender ethnicity fpl citizenship
noNAs = fullNHANES_recat %>% filter(!is.na(citizenship)) %>% filter(!is.na(monoEthyl))
box_citizenship <- ggplot(data = noNAs, design=nhc,
aes(x=log(monoEthyl), y=citizenship, fill=citizenship)) +
scale_fill_brewer(palette="PuBuGn") +
geom_boxplot() +
theme(text = element_text(size=12)) +
xlab("(logged) Mono-Ethyl Phthalate Level (ng/mL)") +
ylab("Participant Citizenship Status") +
ggtitle("Participant Citizenship Status and Logged Phthalate Level")
box_citizenship
website: https://www.statology.org/ols-regression-in-r/ Ordinary least squares (OLS) regression is a method that allows us to find a line that best describes the relationship between one or more predictor variables and a response variable
AIC is an estimator of prediction error and thereby relative quality of statistical models for a given set of data.
ols1 <- (svyglm(log(monoEthyl)~1, design=nhc, na.action = na.omit))
ols1
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ 1, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 4.182
##
## Degrees of Freedom: 21436 Total (i.e. Null); 124 Residual
## (912 observations deleted due to missingness)
## Null Deviance: 53720
## Residual Deviance: 53720 AIC: 88140
# this gives an AIC of 88140
## ^^ this is just practice... from the article I read online
### what does ~1 mean?
# modela <- svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
ols_a <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_a
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.714420
## refEDpartial college and below
## 0.354819
## agemiddle-aged
## 0.363008
## ageolder adult
## 0.349740
## ageyoung adult
## 0.443082
## gendermale
## -0.005086
## ethnicityNon-Hispanic Black
## 0.569355
## ethnicityNon-Hispanic White
## -0.287305
## ethnicityOther Hispanic
## 0.146058
## ethnicityOther or Multi
## -0.565951
## fplfamily income 2x poverty threshold
## 0.030810
## fplfamily income 3x poverty threshold
## 0.060688
## fplfamily income 4x poverty threshold
## 0.107575
## fplfamily income 5x poverty threshold
## 0.118099
## fplfamily income more than 5x poverty threshold
## 0.094568
## citizenshipnot U,S, citizen
## 0.168622
##
## Degrees of Freedom: 19217 Total (i.e. Null); 109 Residual
## (3131 observations deleted due to missingness)
## Null Deviance: 48520
## Residual Deviance: 45590 AIC: 77730
# this gives an AIC of 77,730
# modelb <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit)
ols_b <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_b
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl +
## citizenship, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.71194
## refEDpartial college and below
## 0.35481
## agemiddle-aged
## 0.36316
## ageolder adult
## 0.35010
## ageyoung adult
## 0.44318
## ethnicityNon-Hispanic Black
## 0.56960
## ethnicityNon-Hispanic White
## -0.28719
## ethnicityOther Hispanic
## 0.14634
## ethnicityOther or Multi
## -0.56566
## fplfamily income 2x poverty threshold
## 0.03056
## fplfamily income 3x poverty threshold
## 0.06039
## fplfamily income 4x poverty threshold
## 0.10723
## fplfamily income 5x poverty threshold
## 0.11772
## fplfamily income more than 5x poverty threshold
## 0.09414
## citizenshipnot U,S, citizen
## 0.16831
##
## Degrees of Freedom: 19217 Total (i.e. Null); 110 Residual
## (3131 observations deleted due to missingness)
## Null Deviance: 48520
## Residual Deviance: 45590 AIC: 77730
# this gives an AIC of 77,730
# this means that taking gender out does not improve or decrease the prediction of monoEthyl?
# modelc <- svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit)
ols_c <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit))
ols_c
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.76310
## refEDpartial college and below
## 0.35058
## agemiddle-aged
## 0.37458
## ageolder adult
## 0.35633
## ageyoung adult
## 0.45522
## ethnicityNon-Hispanic Black
## 0.52409
## ethnicityNon-Hispanic White
## -0.33458
## ethnicityOther Hispanic
## 0.13437
## ethnicityOther or Multi
## -0.58169
## fplfamily income 2x poverty threshold
## 0.02752
## fplfamily income 3x poverty threshold
## 0.05597
## fplfamily income 4x poverty threshold
## 0.09813
## fplfamily income 5x poverty threshold
## 0.10755
## fplfamily income more than 5x poverty threshold
## 0.08395
##
## Degrees of Freedom: 19234 Total (i.e. Null); 111 Residual
## (3114 observations deleted due to missingness)
## Null Deviance: 48550
## Residual Deviance: 45670 AIC: 77810
# this gives an AIC of 77,810
# this means that taking out citizenship decreases our ability to predict phthalate level?
#(try)
# take out ethnicity
ols_d <- (svyglm(log(monoEthyl)~refED+age+fpl+citizenship, design=nhc, na.action = na.omit))
ols_d
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + fpl + citizenship,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.65224
## refEDpartial college and below
## 0.40997
## agemiddle-aged
## 0.34705
## ageolder adult
## 0.27423
## ageyoung adult
## 0.42196
## fplfamily income 2x poverty threshold
## -0.03668
## fplfamily income 3x poverty threshold
## -0.05921
## fplfamily income 4x poverty threshold
## -0.03702
## fplfamily income 5x poverty threshold
## -0.04981
## fplfamily income more than 5x poverty threshold
## -0.08725
## citizenshipnot U,S, citizen
## 0.18685
##
## Degrees of Freedom: 19217 Total (i.e. Null); 114 Residual
## (3131 observations deleted due to missingness)
## Null Deviance: 48520
## Residual Deviance: 47290 AIC: 78430
# this gives an AIC of 78,430
# take out age
ols_e <- (svyglm(log(monoEthyl)~refED+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_e
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + ethnicity + fpl + citizenship,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.91599
## refEDpartial college and below
## 0.36530
## ethnicityNon-Hispanic Black
## 0.61722
## ethnicityNon-Hispanic White
## -0.22422
## ethnicityOther Hispanic
## 0.17728
## ethnicityOther or Multi
## -0.53028
## fplfamily income 2x poverty threshold
## 0.04858
## fplfamily income 3x poverty threshold
## 0.07574
## fplfamily income 4x poverty threshold
## 0.13273
## fplfamily income 5x poverty threshold
## 0.15205
## fplfamily income more than 5x poverty threshold
## 0.14101
## citizenshipnot U,S, citizen
## 0.24904
##
## Degrees of Freedom: 19217 Total (i.e. Null); 113 Residual
## (3131 observations deleted due to missingness)
## Null Deviance: 48520
## Residual Deviance: 46040 AIC: 77910
# this gives an AIC of 77,910
# take out refED
ols_f <- (svyglm(log(monoEthyl)~age+ethnicity+fpl+citizenship, design=nhc, na.action = na.omit))
ols_f
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ age + ethnicity + fpl + citizenship,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 4.06866
## agemiddle-aged
## 0.36902
## ageolder adult
## 0.36568
## ageyoung adult
## 0.44037
## ethnicityNon-Hispanic Black
## 0.53583
## ethnicityNon-Hispanic White
## -0.34139
## ethnicityOther Hispanic
## 0.13326
## ethnicityOther or Multi
## -0.67698
## fplfamily income 2x poverty threshold
## 0.01900
## fplfamily income 3x poverty threshold
## 0.02816
## fplfamily income 4x poverty threshold
## 0.05634
## fplfamily income 5x poverty threshold
## 0.03176
## fplfamily income more than 5x poverty threshold
## -0.06905
## citizenshipnot U,S, citizen
## 0.14984
##
## Degrees of Freedom: 19782 Total (i.e. Null); 111 Residual
## (2566 observations deleted due to missingness)
## Null Deviance: 50110
## Residual Deviance: 47490 AIC: 80310
# this gives an AIC of 80,310
# take out fpl
ols_g <- (svyglm(log(monoEthyl)~refED+age+ethnicity+citizenship, design=nhc, na.action = na.omit))
ols_g
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + citizenship,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept) refEDpartial college and below
## 3.7671 0.3202
## agemiddle-aged ageolder adult
## 0.3936 0.3435
## ageyoung adult ethnicityNon-Hispanic Black
## 0.4436 0.5683
## ethnicityNon-Hispanic White ethnicityOther Hispanic
## -0.2672 0.1505
## ethnicityOther or Multi citizenshipnot U,S, citizen
## -0.5109 0.1526
##
## Degrees of Freedom: 20651 Total (i.e. Null); 115 Residual
## (1697 observations deleted due to missingness)
## Null Deviance: 51730
## Residual Deviance: 48680 AIC: 83550
# this gives an AIC of 83,550
# take out citizenship
ols_h <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl, design=nhc, na.action = na.omit))
ols_h
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl,
## design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.76310
## refEDpartial college and below
## 0.35058
## agemiddle-aged
## 0.37458
## ageolder adult
## 0.35633
## ageyoung adult
## 0.45522
## ethnicityNon-Hispanic Black
## 0.52409
## ethnicityNon-Hispanic White
## -0.33458
## ethnicityOther Hispanic
## 0.13437
## ethnicityOther or Multi
## -0.58169
## fplfamily income 2x poverty threshold
## 0.02752
## fplfamily income 3x poverty threshold
## 0.05597
## fplfamily income 4x poverty threshold
## 0.09813
## fplfamily income 5x poverty threshold
## 0.10755
## fplfamily income more than 5x poverty threshold
## 0.08395
##
## Degrees of Freedom: 19234 Total (i.e. Null); 111 Residual
## (3114 observations deleted due to missingness)
## Null Deviance: 48550
## Residual Deviance: 45670 AIC: 77810
# this gives an AIC of 77,810
# add in childED and gender
ols_i <- (svyglm(log(monoEthyl)~refED+age+gender+ethnicity+fpl+citizenship+childED, design=nhc, na.action = na.omit))
ols_i
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + gender + ethnicity +
## fpl + citizenship + childED, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 3.68377
## refEDpartial college and below
## 0.32922
## ageyoung adult
## -0.12090
## gendermale
## -0.18140
## ethnicityNon-Hispanic Black
## 0.64594
## ethnicityNon-Hispanic White
## -0.18883
## ethnicityOther Hispanic
## 0.26191
## ethnicityOther or Multi
## -0.24214
## fplfamily income 2x poverty threshold
## -0.06454
## fplfamily income 3x poverty threshold
## 0.00301
## fplfamily income 4x poverty threshold
## 0.07541
## fplfamily income 5x poverty threshold
## 0.10752
## fplfamily income more than 5x poverty threshold
## -0.08216
## citizenshipnot U,S, citizen
## 0.12371
## childEDsecondary
## 0.62523
##
## Degrees of Freedom: 6618 Total (i.e. Null); 110 Residual
## (15730 observations deleted due to missingness)
## Null Deviance: 8327
## Residual Deviance: 7446 AIC: 24780
# this gives an AIC of 24,780
# (without gender, AIC of 24,800)
# add in adultED
ols_j <- (svyglm(log(monoEthyl)~refED+age+ethnicity+fpl+citizenship+adultED, design=nhc, na.action = na.omit))
ols_j
## Stratified 1 - level Cluster Sampling design (with replacement)
## With (244) clusters.
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Call: svyglm(formula = log(monoEthyl) ~ refED + age + ethnicity + fpl +
## citizenship + adultED, design = nhc, na.action = na.omit)
##
## Coefficients:
## (Intercept)
## 4.46006
## refEDpartial college and below
## 0.14025
## ageolder adult
## -0.02136
## ageyoung adult
## 0.07768
## ethnicityNon-Hispanic Black
## 0.47990
## ethnicityNon-Hispanic White
## -0.37930
## ethnicityOther Hispanic
## 0.05900
## ethnicityOther or Multi
## -0.70407
## fplfamily income 2x poverty threshold
## 0.07539
## fplfamily income 3x poverty threshold
## 0.09065
## fplfamily income 4x poverty threshold
## 0.15013
## fplfamily income 5x poverty threshold
## 0.16150
## fplfamily income more than 5x poverty threshold
## 0.18239
## citizenshipnot U,S, citizen
## 0.15135
## adultEDcollege grad or above
## -0.39638
## adultEDhigh school grad/GED
## -0.06412
## adultEDless than 9th grade
## -0.18396
## adultEDsome college or AA
## -0.15959
##
## Degrees of Freedom: 12131 Total (i.e. Null); 107 Residual
## (10217 observations deleted due to missingness)
## Null Deviance: 39230
## Residual Deviance: 37010 AIC: 49030
# this gives an AIC of 49,030
# with gender, AIC is the same
predmarg<-svypredmeans(ols1, ~interaction(gender,ethnicity))
predmarg
## mean SE
## male.Non-Hispanic Black 4.8278 0.0490
## male.Other or Multi 3.7870 0.0771
## female.Non-Hispanic White 4.0267 0.0377
## female.Mexican American 4.3944 0.0579
## male.Mexican American 4.3826 0.0536
## male.Non-Hispanic White 4.0404 0.0328
## female.Non-Hispanic Black 4.9654 0.0433
## female.Other Hispanic 4.5672 0.0841
## female.Other or Multi 3.7306 0.0690
## male.Other Hispanic 4.4880 0.0758
Non-parametric tests can also be done. Let’s start with a Wilcoxon signed rank test, which is the non-parametric analog of an independent-samples t-test.
wil <- svyranktest(log(monoEthyl)~age, design = nhc, na = TRUE, test = c("wilcoxon"))
wil
##
## Design-based KruskalWallis test
##
## data: log(monoEthyl) ~ age
## df = 3, Chisq = 155, p-value < 2.2e-16
This is an example of a median test.
mtest <- svyranktest(log(monoEthyl)~age, design = nhc, na = TRUE, test=("median"))
mtest
##
## Design-based median test
##
## data: log(monoEthyl) ~ age
## df = 3, Chisq = 131.71, p-value < 2.2e-16
This is an example of a Kruskal Wallis test, which is the non-parametric analog of a one-way ANOVA.
kwtest <- svyranktest(log(monoEthyl)~refED, design = nhc, na = TRUE, test=("KruskalWallis"))
kwtest
##
## Design-based KruskalWallis test
##
## data: log(monoEthyl) ~ refED
## t = 9.0082, df = 123, p-value = 3.303e-15
## alternative hypothesis: true difference in mean rank score is not equal to 0
## sample estimates:
## difference in mean rank score
## 0.07585648
Let’s see a few examples of logistic regression. “as.factor” is key to getting this code running
logit1 <- (svyglm(as.factor(log(monoEthyl))~as.factor(refED)+RIDAGEYR, family=quasibinomial, design=nhc, na.action = na.omit))
summary(logit1)
##
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) +
## RIDAGEYR, design = nhc, family = quasibinomial, na.action = na.omit)
##
## Survey design:
## svydesign(id = ~SDMVPSU, weights = ~WTINT2YR, strata = ~SDMVSTRA,
## nest = TRUE, survey.lonely.psu = "adjust", data = fullNHANES_recat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.75845 0.81423 30.407 <2e-16
## as.factor(refED)partial college and below -16.42189 0.48371 -33.950 <2e-16
## RIDAGEYR -0.02203 0.01672 -1.317 0.19
##
## (Intercept) ***
## as.factor(refED)partial college and below ***
## RIDAGEYR
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.7029368)
##
## Number of Fisher Scoring iterations: 22
subset1 <- subset(nhc, RIDAGEYR > 19)
logit2 <- (svyglm(as.factor(log(monoEthyl))~as.factor(refED)+RIDAGEYR+as.factor(ethnicity)+as.factor(fpl)+as.factor(citizenship), family=quasibinomial, design=subset1, na.action = na.omit))
summary(logit2)
##
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) +
## RIDAGEYR + as.factor(ethnicity) + as.factor(fpl) + as.factor(citizenship),
## design = subset1, family = quasibinomial, na.action = na.omit)
##
## Survey design:
## subset(nhc, RIDAGEYR > 19)
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 59.98801 1.27065
## as.factor(refED)partial college and below -19.14015 0.65048
## RIDAGEYR 0.01573 0.02980
## as.factor(ethnicity)Non-Hispanic Black 0.46688 0.13690
## as.factor(ethnicity)Non-Hispanic White -17.16374 0.58293
## as.factor(ethnicity)Other Hispanic 0.13812 0.18767
## as.factor(ethnicity)Other or Multi 0.23037 0.23507
## as.factor(fpl)family income 2x poverty threshold 0.24851 0.17770
## as.factor(fpl)family income 3x poverty threshold -18.53777 0.73302
## as.factor(fpl)family income 4x poverty threshold 0.50858 0.21199
## as.factor(fpl)family income 5x poverty threshold 0.41368 0.19092
## as.factor(fpl)family income more than 5x poverty threshold -18.77863 0.87613
## as.factor(citizenship)not U,S, citizen 15.53777 0.61753
## t value Pr(>|t|)
## (Intercept) 47.211 < 2e-16 ***
## as.factor(refED)partial college and below -29.424 < 2e-16 ***
## RIDAGEYR 0.528 0.598645
## as.factor(ethnicity)Non-Hispanic Black 3.410 0.000903 ***
## as.factor(ethnicity)Non-Hispanic White -29.444 < 2e-16 ***
## as.factor(ethnicity)Other Hispanic 0.736 0.463291
## as.factor(ethnicity)Other or Multi 0.980 0.329191
## as.factor(fpl)family income 2x poverty threshold 1.398 0.164730
## as.factor(fpl)family income 3x poverty threshold -25.290 < 2e-16 ***
## as.factor(fpl)family income 4x poverty threshold 2.399 0.018087 *
## as.factor(fpl)family income 5x poverty threshold 2.167 0.032376 *
## as.factor(fpl)family income more than 5x poverty threshold -21.434 < 2e-16 ***
## as.factor(citizenship)not U,S, citizen 25.161 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.1814485)
##
## Number of Fisher Scoring iterations: 24
logit3 <- (svyglm(as.factor(log(monoEthyl))~as.factor(adultED)+RIDAGEYR+as.factor(fpl), family=quasibinomial, design=subset1, na.action = na.omit))
summary(logit3)
##
## Call:
## svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(adultED) +
## RIDAGEYR + as.factor(fpl), design = subset1, family = quasibinomial,
## na.action = na.omit)
##
## Survey design:
## subset(nhc, RIDAGEYR > 19)
##
## Coefficients:
## Estimate
## (Intercept) 41.959437
## as.factor(adultED)college grad or above 1.690212
## as.factor(adultED)high school grad/GED -17.539382
## as.factor(adultED)less than 9th grade -0.737155
## as.factor(adultED)some college or AA -17.604884
## RIDAGEYR 0.008761
## as.factor(fpl)family income 2x poverty threshold 0.171777
## as.factor(fpl)family income 3x poverty threshold -18.674215
## as.factor(fpl)family income 4x poverty threshold 0.370404
## as.factor(fpl)family income 5x poverty threshold 0.259681
## as.factor(fpl)family income more than 5x poverty threshold -18.969089
## Std. Error t value
## (Intercept) 1.318773 31.817
## as.factor(adultED)college grad or above 0.582902 2.900
## as.factor(adultED)high school grad/GED 0.764768 -22.934
## as.factor(adultED)less than 9th grade 0.509135 -1.448
## as.factor(adultED)some college or AA 0.842382 -20.899
## RIDAGEYR 0.030038 0.292
## as.factor(fpl)family income 2x poverty threshold 0.157567 1.090
## as.factor(fpl)family income 3x poverty threshold 0.697581 -26.770
## as.factor(fpl)family income 4x poverty threshold 0.303998 1.218
## as.factor(fpl)family income 5x poverty threshold 0.319749 0.812
## as.factor(fpl)family income more than 5x poverty threshold 0.997105 -19.024
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## as.factor(adultED)college grad or above 0.00448 **
## as.factor(adultED)high school grad/GED < 2e-16 ***
## as.factor(adultED)less than 9th grade 0.15040
## as.factor(adultED)some college or AA < 2e-16 ***
## RIDAGEYR 0.77106
## as.factor(fpl)family income 2x poverty threshold 0.27793
## as.factor(fpl)family income 3x poverty threshold < 2e-16 ***
## as.factor(fpl)family income 4x poverty threshold 0.22557
## as.factor(fpl)family income 5x poverty threshold 0.41840
## as.factor(fpl)family income more than 5x poverty threshold < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasibinomial family taken to be 0.2117443)
##
## Number of Fisher Scoring iterations: 24
We can also get a Wald test for a variable in the model.
regTermTest(logit2, ~RIDAGEYR)
## Wald test for RIDAGEYR
## in svyglm(formula = as.factor(log(monoEthyl)) ~ as.factor(refED) +
## RIDAGEYR + as.factor(ethnicity) + as.factor(fpl) + as.factor(citizenship),
## design = subset1, family = quasibinomial, na.action = na.omit)
## F = 0.2786305 on 1 and 112 df: p= 0.59864
Instead of getting an R-squared value as you do in linear regression, a pseudo-R-squared is given in logistic regression. There are many different versions of pseudo-R-squared, and two of them are available with the psrsq function.
psrsq(logit2, method = c("Cox-Snell"))
## [1] 0.001953773
psrsq(logit2, method = c("Nagelkerke"))
## [1] 0.2118337
Below is an example of an ordered logistic regression. Note that the outcome variable must be a factor.
ologit1 <- svyolr(as.factor(ethnicity)~as.factor(gender)+as.factor(citizenship)+RIDAGEYR, design = nhc, method = c("logistic"))
summary(ologit1)
## Call:
## svyolr(as.factor(ethnicity) ~ as.factor(gender) + as.factor(citizenship) +
## RIDAGEYR, design = nhc, method = c("logistic"))
##
## Coefficients:
## Value Std. Error t value
## as.factor(gender)male -0.028130504 0.0254625109 -1.104781
## as.factor(citizenship)not U,S, citizen -0.511256670 0.2262551060 -2.259647
## RIDAGEYR 0.006759457 0.0007330932 9.220461
##
## Intercepts:
## Value Std. Error t value
## Mexican American|Non-Hispanic Black -2.0617 0.0842 -24.4983
## Non-Hispanic Black|Non-Hispanic White -1.0566 0.0735 -14.3677
## Non-Hispanic White|Other Hispanic 2.1920 0.0695 31.5463
## Other Hispanic|Other or Multi 2.8355 0.0673 42.1320
## (42 observations deleted due to missingness)
There are many more types of analyses that are available in the survey package and in other packages that work with complex survey data. A few examples:
Principle components analysis (PCA).
pc <- svyprcomp(~monoEthyl+gender+refED, design=nhc,scale=TRUE,scores=TRUE)
pc
## Standard deviations (1, .., p=4):
## [1] 1.3128311 1.0579079 0.9720452 0.4609050
##
## Rotation (n x k) = (4 x 4):
## PC1 PC2 PC3 PC4
## monoEthyl -0.004235365 0.5636092 -0.82582215 0.01856026
## genderfemale -0.711199991 0.1409689 0.08449453 -0.68350788
## gendermale 0.701706650 0.1938654 0.11351522 -0.67612002
## refEDpartial college and below -0.042242310 0.7904990 0.54588712 0.27447078
Cronbach’s alpha.
svycralpha(~log(monoEthyl)+RIDAGEYR, design=nhc, na.rm = TRUE)
## *alpha*
## 0.01325312